Discrete-Time Mean Field Control with Environment States

Multi-agent reinforcement learning methods have shown remarkable potential in solving complex multi-agent problems but mostly lack theoretical guarantees. Recently, mean field control and mean field games have been established as a tractable solution for large-scale multi-agent problems with many agents. In this work, driven by a motivating scheduling problem, we consider a discrete-time mean field control model with common environment states. We rigorously establish approximate optimality as the number of agents grows in the finite agent case and find that a dynamic programming principle holds, resulting in the existence of an optimal stationary policy. As exact solutions are difficult in general due to the resulting continuous action space of the limiting mean field Markov decision process, we apply established deep reinforcement learning methods to solve the associated mean field control problem. The performance of the learned mean field control policy is compared to typical multi-agent reinforcement learning approaches and is found to converge to the mean field performance for sufficiently many agents, verifying the obtained theoretical results and reaching competitive solutions.

[1]  Peter E. Caines,et al.  $\epsilon$-Nash Equilibria for Partially Observed LQG Mean Field Games With a Major Player , 2017, IEEE Transactions on Automatic Control.

[2]  Ravi Mazumdar,et al.  Analysis of Randomized Join-the-Shortest-Queue (JSQ) Schemes in Large Heterogeneous Processor-Sharing Systems , 2016, IEEE Transactions on Control of Network Systems.

[3]  A. Bensoussan,et al.  Mean Field Games and Mean Field Type Control Theory , 2013 .

[4]  Heinz Koeppl,et al.  Generalized Cost-Based Job Scheduling in Very Large Heterogeneous Cluster Systems , 2020, IEEE Transactions on Parallel and Distributed Systems.

[5]  Peter E. Caines,et al.  Epsilon-Nash Mean Field Game Theory for Nonlinear Stochastic Dynamical Systems with Major and Minor Agents , 2012, SIAM J. Control. Optim..

[6]  Heinz Koeppl,et al.  Approximately Solving Mean Field Games via Entropy-Regularized Deep Reinforcement Learning , 2021, AISTATS.

[7]  Peter E. Caines,et al.  Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle , 2006, Commun. Inf. Syst..

[8]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[9]  Tamer Basar,et al.  Markov-Nash equilibria in mean-field games with discounted cost , 2016, 2017 American Control Conference (ACC).

[10]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[11]  Ying-Chang Liang,et al.  Applications of Deep Reinforcement Learning in Communications and Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[12]  Mao Fabrice Djete,et al.  McKean–Vlasov optimal control: The dynamic programming principle , 2019, The Annals of Probability.

[13]  Noam Brown,et al.  Superhuman AI for multiplayer poker , 2019, Science.

[14]  Renyuan Xu,et al.  Dynamic Programming Principles for Learning MFCs , 2019 .

[15]  M'ed'eric Motte UPD7,et al.  Mean-field Markov decision processes with common noise and open-loop controls , 2019, The Annals of Applied Probability.

[16]  Hamidou Tembine,et al.  A stochastic maximum principle for risk-sensitive mean-field-type control , 2014, 53rd IEEE Conference on Decision and Control.

[17]  D. Gomes,et al.  Discrete Time, Finite State Space Mean Field Games , 2010 .

[18]  Tamer Basar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[19]  Daniel Andersson,et al.  A Maximum Principle for SDEs of Mean-Field Type , 2011 .

[20]  Hamidou Tembine,et al.  Risk-Sensitive Mean-Field Type Control under Partial Observation , 2014, ArXiv.

[21]  H. Pham,et al.  Bellman equation and viscosity solutions for mean-field stochastic control problem , 2015, 1512.07866.

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  Sean P. Meyn,et al.  Learning in Mean-Field Games , 2014, IEEE Transactions on Automatic Control.

[24]  Arman C. Kizilkale,et al.  Collective target tracking mean field control for electric space heaters , 2014, 22nd Mediterranean Conference on Control and Automation.

[25]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[26]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[27]  Maxim Raginsky,et al.  Approximate Nash Equilibria in Partially Observed Stochastic Games with Mean-Field Interactions , 2017, Math. Oper. Res..

[28]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[29]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[30]  Xin Guo,et al.  Mean-Field Controls with Q-Learning for Cooperative MARL: Convergence and Complexity Analysis , 2021, SIAM J. Math. Data Sci..

[31]  Enrique Munoz de Cote,et al.  Decentralised Learning in Systems with Many, Many Strategic Agents , 2018, AAAI.

[32]  P. Lions,et al.  Mean field games , 2007 .