Optimal Control of Complex Systems through Variational Inference with a Discrete Event Decision Process

Complex social systems are composed of interconnected individuals whose interactions result in group behaviors. Optimal control of a real-world complex system has many applications, including road traffic management, epidemic prevention, and information dissemination. However, such real-world complex system control is difficult to achieve because of high-dimensional and non-linear system dynamics, and the exploding state and action spaces for the decision maker. Prior methods can be divided into two categories: simulation-based and analytical approaches. Existing simulation approaches have high-variance in Monte Carlo integration, and the analytical approaches suffer from modeling inaccuracy. We adopted simulation modeling in specifying the complex dynamics of a complex system, and developed analytical solutions for searching optimal strategies in a complex network with high-dimensional state-action space. To capture the complex system dynamics, we formulate the complex social network decision making problem as a discrete event decision process. To address the curse of dimensionality and search in high-dimensional state action spaces in complex systems, we reduce control of a complex system to variational inference and parameter learning, introduce Bethe entropy approximation, and develop an expectation propagation algorithm. Our proposed algorithm leads to higher system expected rewards, faster convergence, and lower variance of value function in a real-world transportation scenario than state-of-the-art analytical and sampling approaches.

[1]  Claude Lefèvre,et al.  Optimal Control of a Birth and Death Epidemic Process , 1981, Oper. Res..

[2]  Arne Koopman,et al.  Intelligent Traffic Light Control , 2004 .

[3]  Darren J. Wilkinson Stochastic Modelling for Systems Biology , 2006 .

[4]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[5]  Groupe Pdmia Markov Decision Processes In Artificial Intelligence , 2009 .

[6]  Sargur N. Srihari,et al.  Using Social Dynamics to Make Individual Predictions: Variational Inference with a Stochastic Kinetic Model , 2016, NIPS.

[7]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[8]  Sergey Levine,et al.  Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.

[9]  Dipti Srinivasan,et al.  Urban traffic signal control using reinforcement learning agents , 2010 .

[10]  David Barber,et al.  Variational methods for Reinforcement Learning , 2010, AISTATS.

[11]  Fan Yang,et al.  Integrating simulation and signal processing in tracking complex social systems , 2018, Computational and Mathematical Organization Theory.

[12]  Shuang Li,et al.  COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution , 2015, NIPS.

[13]  Sergey Levine,et al.  Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[16]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[17]  Kay W. Axhausen,et al.  The Multi-Agent Transport Simulation , 2016 .

[18]  Henry A. Kautz,et al.  Learning and inferring transportation routines , 2004, Artif. Intell..

[19]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[20]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[21]  Nathalie Peyrard,et al.  A framework and a mean-field algorithm for the local control of spatial processes , 2012, Int. J. Approx. Reason..

[22]  Chinwendu Enyioha,et al.  Optimal vaccine allocation to control epidemic outbreaks in arbitrary networks , 2013, 52nd IEEE Conference on Decision and Control.

[23]  Danya Yao,et al.  A Survey of Traffic Control With Vehicular Communications , 2014, IEEE Transactions on Intelligent Transportation Systems.

[24]  Nando de Freitas,et al.  New inference strategies for solving Markov Decision Processes using reversible jump MCMC , 2009, UAI.

[25]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[26]  Nathalie Peyrard,et al.  Mean Field Approximation of the Policy Iteration Algorithm for Graph-Based Markov Decision Processes , 2006, ECAI.

[27]  Warren B. Powell,et al.  Optimal control of dosage decisions in controlled ovarian hyperstimulation , 2010, Ann. Oper. Res..

[28]  Qiang Liu,et al.  Belief Propagation for Structured Decision Making , 2012, UAI.

[29]  Fei-Yue Wang,et al.  Parallel Control and Management for Intelligent Transportation Systems: Concepts, Architectures, and Applications , 2010, IEEE Transactions on Intelligent Transportation Systems.

[30]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[31]  Qiang Liu,et al.  Variational Planning for Graph-based MDPs , 2013, NIPS.

[32]  David Barber,et al.  Efficient Inference in Markov Control Problems , 2011, UAI.

[33]  Fan Yang,et al.  Integrating Simulation and Signal Processing with Stochastic Social Kinetic Model , 2017, SBP-BRiMS.

[34]  Chunming Qiao,et al.  Expectation Propagation with Stochastic Kinetic Model in Complex Interaction Systems , 2017, NIPS.

[35]  Guido Sanguinetti,et al.  Variational inference for Markov jump processes , 2007, NIPS.

[36]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[37]  Andrei. Borshchev,et al.  The Big Book of Simulation Modeling: Multimethod Modeling with Anylogic 6 , 2013 .

[38]  Li Li,et al.  Traffic signal timing via deep reinforcement learning , 2016, IEEE/CAA Journal of Automatica Sinica.

[39]  Marios M. Polycarpou,et al.  Distributed Traffic Signal Control Using the Cell Transmission Model via the Alternating Direction Method of Multipliers , 2015, IEEE Transactions on Intelligent Transportation Systems.

[40]  Averill M. Law,et al.  Simulation Modeling and Analysis , 1982 .

[41]  S. Page What Sociologists Should Know About Complexity , 2015 .

[42]  Vicenç Gómez,et al.  Optimal control as a graphical model inference problem , 2009, Machine Learning.

[43]  Fan Yang,et al.  Predicting and Optimizing City-Scale Road Traffic Dynamics Using Trajectories of Individual Vehicles , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[44]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .