We live in a world full of complex social systems. Achieving optimal control in a complex social system is challenging due to the difficulty in modeling and optimization. To capture the complex social system dynamics accurately and succinctly, we model the decision-making problem as a partially observable discrete event decision process. To withstand the curse of dimensionality in highdimensional belief state spaces and to optimize the problem in an amenable searching space, we investigate the connections between the value function of a partially observable decision process and that in the corresponding fully-observable scenario, and reduce the optimal control of a partially observable discrete event decision process to a policy optimization with a specially formed fully observable decision process and a belief state estimation. When tested in real-world transportation scenarios, in comparison with other state-of-the-art approaches, our proposed algorithm leads to the least average time on-road, the largest number of vehicles at work during work hours and the fewest training epochs to converge to the highest total rewards per episode.
Optimal control in partially observable complex social systems
Lepri, Bruno;
2020-01-01
Abstract
We live in a world full of complex social systems. Achieving optimal control in a complex social system is challenging due to the difficulty in modeling and optimization. To capture the complex social system dynamics accurately and succinctly, we model the decision-making problem as a partially observable discrete event decision process. To withstand the curse of dimensionality in highdimensional belief state spaces and to optimize the problem in an amenable searching space, we investigate the connections between the value function of a partially observable decision process and that in the corresponding fully-observable scenario, and reduce the optimal control of a partially observable discrete event decision process to a policy optimization with a specially formed fully observable decision process and a belief state estimation. When tested in real-world transportation scenarios, in comparison with other state-of-the-art approaches, our proposed algorithm leads to the least average time on-road, the largest number of vehicles at work during work hours and the fewest training epochs to converge to the highest total rewards per episode.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.