On the Convergence of Model Free Learning in Mean Field Games

Learning by experience in Multi-Agent Systems (MAS) is a difficult and exciting task, due to the lack of stationarity of the environment, whose dynamics evolves as the population learns. In order to design scalable algorithms for systems with a large population of interacting agents (e.g., swarms), this paper focuses on Mean Field MAS, where the number of agents is asymptotically infinite. Recently, a very active burgeoning field studies the effects of diverse reinforcement learning algorithms for agents with no prior information on a stationary Mean Field Game (MFG) and learn their policy through repeated experience. We adopt a high perspective on this problem and analyze in full generality the convergence of a fictitious iterative scheme using any single agent learning algorithm at each step. We quantify the quality of the computed approximate Nash equilibrium, in terms of the accumulated errors arising at each learning iteration step. Notably, we show for the first time convergence of model free learning algorithms towards non-stationary MFG equilibria, relying only on classical assumptions on the MFG dynamics. We illustrate our theoretical results with a numerical experiment in a continuous action-space environment, where the approximate best response of the iterative fictitious play scheme is computed with a deep RL algorithm.

[1]  H. Kushner Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .

[2]  P. Lions,et al.  Jeux à champ moyen. II – Horizon fini et contrôle optimal , 2006 .

[3]  Michael Kearns,et al.  Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.

[4]  P. Cardaliaguet,et al.  Mean field game of controls and an application to trade crowding , 2016, 1610.09904.

[5]  Yishay Mansour,et al.  Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[6]  Enrique Munoz de Cote,et al.  Decentralised Learning in Systems with Many, Many Strategic Agents , 2018, AAAI.

[7]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[8]  P. Cardaliaguet,et al.  Mean Field Games , 2020, Lecture Notes in Mathematics.

[9]  Ruimeng Hu,et al.  Deep Fictitious Play for Stochastic Differential Games , 2019, Communications in Mathematical Sciences.

[10]  Pierre Cardaliaguet,et al.  Learning in mean field games: The fictitious play , 2015, 1507.06280.

[11]  J. Janssen,et al.  Deterministic and Stochastic Optimal Control , 2013 .

[12]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[13]  V. Borkar Stochastic approximation with two time scales , 1997 .

[14]  A. Bensoussan,et al.  Mean Field Games and Mean Field Type Control Theory , 2013 .

[15]  Matthieu Geist,et al.  A Theory of Regularized Markov Decision Processes , 2019, ICML.

[16]  Aditya Mahajan,et al.  Reinforcement Learning in Stationary Mean-field Games , 2019, AAMAS.

[17]  J. Robinson AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[18]  Csaba Szepesvári,et al.  The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.

[19]  Csaba Szepesvári,et al.  Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.

[20]  Olivier Pietquin,et al.  Actor-Critic Fictitious Play in Simultaneous Move Multistage Games , 2018, AISTATS.

[21]  S. Vajda Some topics in two-person games , 1971 .

[22]  M. Stanković Multi-agent reinforcement learning , 2016 .

[23]  François Delarue,et al.  Probabilistic Theory of Mean Field Games with Applications I: Mean Field FBSDEs, Control, and Games , 2018 .

[24]  Sean P. Meyn,et al.  Learning in mean-field oscillator games , 2010, 49th IEEE Conference on Decision and Control (CDC).

[25]  Peter E. Caines,et al.  Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle , 2006, Commun. Inf. Syst..

[26]  Matthieu Geist,et al.  Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..

[27]  Vaneet Aggarwal,et al.  Reinforcement Learning for Mean Field Game , 2019, Algorithms.

[28]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[30]  David S. Leslie,et al.  Generalised weakened fictitious play , 2006, Games Econ. Behav..

[31]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[32]  Guilherme Carmona Nash Equilibria of Games with a Continuum of Players , 2004 .

[33]  Sean P. Meyn,et al.  Learning in Mean-Field Games , 2014, IEEE Transactions on Automatic Control.

[34]  Francisco J. Silva,et al.  Finite Mean Field Games: Fictitious play and convergence to a first order continuous mean field game , 2018, Journal de Mathématiques Pures et Appliquées.

[35]  P. Lions,et al.  Jeux à champ moyen. I – Le cas stationnaire , 2006 .

[36]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[37]  Diogo Gomes,et al.  Two Numerical Approaches to Stationary Mean-Field Games , 2015, Dynamic Games and Applications.

[38]  Hongyuan Zha,et al.  Deep Mean Field Games for Learning Optimal Behavior Policy of Large Populations , 2017, ICLR 2018.

[39]  Hilbert J. Kappen,et al.  Speedy Q-Learning , 2011, NIPS.

[40]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.