We live in a world full of complex social systems. Achieving optimal control in a complex social system is challenging due to the difficulty in modeling and optimization. To capture the complex social system dynamics accurately and succinctly, we model the decision-making problem as a partially observable discrete event decision process. To withstand the curse of dimensionality in highdimensional belief state spaces and to optimize the problem in an amenable searching space, we investigate the connections between the value function of a partially observable decision process and that in the corresponding fully-observable scenario, and reduce the optimal control of a partially observable discrete event decision process to a policy optimization with a specially formed fully observable decision process and a belief state estimation. When tested in real-world transportation scenarios, in comparison with other state-of-the-art approaches, our proposed algorithm leads to the least average time on-road, the largest number of vehicles at work during work hours and the fewest training epochs to converge to the highest total rewards per episode.

Optimal control in partially observable complex social systems

Lepri, Bruno;
2020-01-01

Abstract

We live in a world full of complex social systems. Achieving optimal control in a complex social system is challenging due to the difficulty in modeling and optimization. To capture the complex social system dynamics accurately and succinctly, we model the decision-making problem as a partially observable discrete event decision process. To withstand the curse of dimensionality in highdimensional belief state spaces and to optimize the problem in an amenable searching space, we investigate the connections between the value function of a partially observable decision process and that in the corresponding fully-observable scenario, and reduce the optimal control of a partially observable discrete event decision process to a policy optimization with a specially formed fully observable decision process and a belief state estimation. When tested in real-world transportation scenarios, in comparison with other state-of-the-art approaches, our proposed algorithm leads to the least average time on-road, the largest number of vehicles at work during work hours and the fewest training epochs to converge to the highest total rewards per episode.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/320800
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact