Scaling up Mean Field Games with Online Mirror Descent

We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD). We show that continuoustime OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions. This theoretical result nicely extends to multi-population games and to settings involving common noise. A thorough experimental investigation on various single and multi-population MFGs shows that OMD outperforms traditional algorithms such as Fictitious Play (FP). We empirically show that OMD scales up and converges significantly faster than FP by solving, for the first time to our knowledge, examples of MFGs with hundreds of billions states. This study establishes the state-ofthe-art for learning in large-scale multi-agent and multi-population games.

[1]  Yves Achdou,et al.  Mean Field Games: Numerical Methods for the Planning Problem , 2012, SIAM J. Control. Optim..

[2]  Yves Achdou,et al.  Mean Field Games for Modeling Crowd Motion , 2018, Computational Methods in Applied Sciences.

[3]  Christos H. Papadimitriou,et al.  Cycles in adversarial regularized learning , 2017, SODA.

[4]  Marco Cirant,et al.  Multi-population Mean Field Games systems with Neumann boundary conditions , 2015 .

[5]  Marco Cirant,et al.  Bifurcation and segregation in quadratic two-populations mean field games systems , 2015, 1511.09343.

[6]  Vincent Conitzer,et al.  Expressive markets for donating to charities , 2011, Artif. Intell..

[7]  Zhuoran Yang,et al.  Provable Fictitious Play for General Mean-Field Games , 2020, ArXiv.

[8]  Jeff S. Shamma,et al.  Optimization Despite Chaos: Convex Relaxations to Complex Limit Sets via Poincaré Recurrence , 2014, SODA.

[9]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[10]  K. I. M. McKinnon,et al.  On the Generation of Markov Decision Processes , 1995 .

[11]  P. Lions,et al.  Mean field games , 2007 .

[12]  Vincent Conitzer,et al.  Adapting a Kidney Exchange Algorithm to Align with Human Values , 2018, AAAI.

[13]  Lasse Becker-Czarnetzki Report on DeepStack Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker , 2019 .

[14]  Georgios Piliouras,et al.  Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos , 2017, NIPS.

[15]  Sriram Vishwanath,et al.  Model-free Reinforcement Learning for Non-stationary Mean Field Games , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[16]  Pierre Cardaliaguet,et al.  Learning in mean field games: The fictitious play , 2015, 1507.06280.

[17]  Y. Achdou,et al.  ON THE SYSTEM OF PARTIAL DIFFERENTIAL EQUATIONS ARISING IN MEAN FIELD TYPE CONTROL , 2015, 1503.05044.

[18]  Samy Wu Fung,et al.  APAC-Net: Alternating the Population and Agent Control via Two Neural Networks to Solve High-Dimensional Stochastic Mean Field Games , 2020, ArXiv.

[19]  Romuald Elie,et al.  On the Convergence of Model Free Learning in Mean Field Games , 2020, AAAI.

[20]  Ermal Feleqi The Derivation of Ergodic Mean Field Game Equations for Several Populations of Players , 2013, Dyn. Games Appl..

[21]  Ali Al-Aradi,et al.  Solving Nonlinear and High-Dimensional Partial Differential Equations via Deep Learning , 2018, 1811.08782.

[22]  P. Cardaliaguet,et al.  Mean Field Games , 2020, Lecture Notes in Mathematics.

[23]  Mathieu Lauriere,et al.  On the implementation of a primal-dual algorithm for second order time-dependent Mean Field Games with local couplings , 2018, ESAIM: Proceedings and Surveys.

[24]  Levon Nurbekyan,et al.  A machine learning framework for solving high-dimensional mean field game and mean field control problems , 2020, Proceedings of the National Academy of Sciences.

[25]  David S. Leslie,et al.  Bandit learning in concave $N$-person games , 2018, 1810.01925.

[26]  Yves Achdou,et al.  Mean Field Games: Numerical Methods , 2010, SIAM J. Numer. Anal..

[27]  Dante Kalise,et al.  Proximal Methods for Stationary Mean Field Games with Local Couplings , 2016, SIAM J. Control. Optim..

[28]  Ambuj Tewari,et al.  On the Universality of Online Mirror Descent , 2011, NIPS.

[29]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[30]  Mathieu Lauriere,et al.  Unified reinforcement Q-learning for mean field game and control problems , 2022, Mathematics of Control, Signals, and Systems.

[31]  Dan Crisan,et al.  Numerical method for FBSDEs of McKean–Vlasov type , 2017, The Annals of Applied Probability.

[32]  M. Bardi,et al.  Mean field games models of segregation , 2016, 1607.04453.

[33]  Hamidou Tembine,et al.  A Mean-Field Game of Evacuation in Multilevel Building , 2017, IEEE Transactions on Automatic Control.

[34]  Hamidou Tembine,et al.  Electrical Vehicles in the Smart Grid: A Mean Field Game Analysis , 2011, IEEE Journal on Selected Areas in Communications.

[35]  A. Bensoussan,et al.  Mean Field Games and Mean Field Type Control Theory , 2013 .

[36]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[37]  Mirco Musolesi,et al.  Precise time-matching in chimpanzee allogrooming does not occur after a short delay , 2018, PloS one.

[38]  Mathieu Laurière,et al.  Convergence Analysis of Machine Learning Algorithms for the Numerical Solution of Mean Field Control and Games: I - The Ergodic Case , 2019, The Annals of Applied Probability.

[39]  Peter W. Glynn,et al.  Mirror descent learning in continuous games , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[40]  Sean P. Meyn,et al.  Learning in mean-field oscillator games , 2010, 49th IEEE Conference on Decision and Control (CDC).

[41]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[42]  Sean P. Meyn,et al.  Learning in Mean-Field Games , 2014, IEEE Transactions on Automatic Control.

[43]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[44]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[45]  Xavier Warin,et al.  Numerical resolution of McKean-Vlasov FBSDEs using neural networks. , 2019 .

[46]  Romuald Elie,et al.  Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications , 2020, NeurIPS.

[47]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[48]  J. Robinson AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[49]  Marie-Therese Wolfram,et al.  On a mean field game approach modeling congestion and aversion in pedestrian crowds , 2011 .

[50]  D. Bazeia,et al.  Pattern formations driven by cyclic interactions: A brief review of recent developments , 2020, EPL (Europhysics Letters).

[51]  Pierre-Louis Lions,et al.  PDE Models in Macroeconomics , 2014 .

[52]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[53]  Eizo Akiyama,et al.  Chaos in learning a simple two-person game , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Saeed Hadikhanloo,et al.  Learning in anonymous nonatomic games with applications to first-order mean field games , 2017, 1704.00378.

[55]  Mathieu Lauriere,et al.  Mean Field Control and Mean Field Game Models with Several Populations , 2018, 1810.00783.

[56]  Francisco J. Silva,et al.  Finite Mean Field Games: Fictitious play and convergence to a first order continuous mean field game , 2018, Journal de Mathématiques Pures et Appliquées.

[57]  Mathieu Lauriere,et al.  Mean Field Games and Applications: Numerical Aspects , 2020, Lecture Notes in Mathematics.

[58]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[59]  Yang Cai,et al.  Zero-Sum Polymatrix Games: A Generalization of Minmax , 2016, Math. Oper. Res..

[60]  Elisabetta Carlini,et al.  A Fully Discrete Semi-Lagrangian Scheme for a First Order Mean Field Game Problem , 2012, SIAM J. Numer. Anal..

[61]  H. N. Shapiro Note on a Computation Method in the Theory of Games , 1958 .

[62]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[63]  William H. Sandholm,et al.  Learning in Games via Reinforcement and Regularization , 2014, Math. Oper. Res..

[64]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[65]  Boualem Djehiche,et al.  Mean-Field Type Modeling of Nonlocal Crowd Aversion in Pedestrian Crowd Dynamics , 2017, SIAM J. Control. Optim..

[66]  C. Postlethwaite,et al.  Spirals and heteroclinic cycles in a spatially extended Rock-Paper-Scissors model of cyclic dominance , 2016 .

[67]  Georgios Piliouras,et al.  The route to chaos in routing games: When is price of anarchy too optimistic? , 2019, NeurIPS.

[68]  Pierre Cardaliaguet,et al.  Convergence of some Mean Field Games systems to aggregation and flocking models , 2020 .

[69]  Mathieu Lauriere,et al.  Connecting GANs, MFGs, and OT , 2020, 2002.04112.

[70]  Peter E. Caines,et al.  Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle , 2006, Commun. Inf. Syst..

[71]  Noam Brown,et al.  Superhuman AI for multiplayer poker , 2019, Science.

[72]  Martino Bardi,et al.  Uniqueness of solutions in Mean Field Games with several populations and Neumann conditions , 2018 .

[73]  Elisabetta Carlini,et al.  A Semi-Lagrangian scheme for a degenerate second order Mean Field Game system , 2014, 1404.5932.

[74]  Mathieu Lauriere,et al.  Convergence Analysis of Machine Learning Algorithms for the Numerical Solution of Mean Field Control and Games I: The Ergodic Case , 2019, SIAM J. Numer. Anal..

[75]  Jean-Pierre Fouque,et al.  Deep Learning Methods for Mean Field Control Problems With Delay , 2019, Frontiers in Applied Mathematics and Statistics.

[76]  David M. Pennock,et al.  A practical liquidity-sensitive automated market maker , 2010, EC '10.

[77]  Jayakumar Subramanian Reinforcement learning for mean-field teams , 2019 .