Reinforcement Learning for Mean Field Games, with Applications to Economics

Mean field games (MFG) and mean field control problems (MFC) are frameworks to study Nash equilibria or social optima in games with a continuum of agents. These problems can be used to approximate competitive or cooperative games with a large finite number of agents and have found a broad range of applications, in particular in economics. In recent years, the question of learning in MFG and MFC has garnered interest, both as a way to compute solutions and as a way to model how large populations of learners converge to an equilibrium. Of particular interest is the setting where the agents do not know the model, which leads to the development of reinforcement learning (RL) methods. After reviewing the literature on this topic, we present a two timescale approach with RL for MFG and MFC, which relies on a unified Q-learning algorithm. The main novelty of this method is to simultaneously update an action-value function and a distribution but with different rates, in a model-free fashion. Depending on the ratio of the two learning rates, the algorithm learns either the MFG or the MFC solution. To illustrate this method, we apply it to a mean field problem of accumulated consumption in finite horizon with HARA utility function, and to a trader’s optimal liquidation problem.

[1]  Romuald Elie,et al.  Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications , 2020, NeurIPS.

[2]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[3]  Mathieu Lauriere,et al.  Numerical Methods for Mean Field Games and Mean Field Type Control , 2021, Proceedings of Symposia in Applied Mathematics.

[4]  Naci Saldi,et al.  Q-Learning in Regularized Mean-field Games , 2020, ArXiv.

[5]  F. Downton Stochastic Approximation , 1969, Nature.

[6]  Sriram Vishwanath,et al.  Model-free Reinforcement Learning for Non-stationary Mean Field Games , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[7]  Peter E. Caines,et al.  Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle , 2006, Commun. Inf. Syst..

[8]  Matthieu Geist,et al.  Mean Field Games Flock! The Reinforcement Learning Way , 2021, IJCAI.

[9]  P. Cardaliaguet,et al.  Mean field game of controls and an application to trade crowding , 2016, 1610.09904.

[10]  Minyi Huang A Mean Field Capital Accumulation Game with HARA Utility , 2013, Dyn. Games Appl..

[11]  Diogo A. Gomes,et al.  On the existence of classical solutions for stationary extended mean field games , 2013, 1305.2696.

[12]  Saeed Hadikhanloo,et al.  Learning in anonymous nonatomic games with applications to first-order mean field games , 2017, 1704.00378.

[13]  R. Carmona,et al.  Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning , 2019, The Annals of Applied Probability.

[14]  Renyuan Xu,et al.  Dynamic Programming Principles for Learning MFCs , 2019 .

[15]  Pierre Cardaliaguet,et al.  Learning in mean field games: The fictitious play , 2015, 1507.06280.

[16]  Vivek S. Borkar,et al.  The actor-critic algorithm as multi-time-scale stochastic approximation , 1997 .

[17]  Mathieu Lauriere,et al.  Unified reinforcement Q-learning for mean field game and control problems , 2022, Mathematics of Control, Signals, and Systems.

[18]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[19]  Levon Nurbekyan,et al.  A machine learning framework for solving high-dimensional mean field game and mean field control problems , 2020, Proceedings of the National Academy of Sciences.

[20]  Ruimeng Hu,et al.  Signatured Deep Fictitious Play for Mean Field Games with Common Noise , 2021, ICML.

[21]  A. Bensoussan,et al.  Mean Field Games and Mean Field Type Control Theory , 2013 .

[22]  V. Borkar Stochastic approximation with two time scales , 1997 .

[23]  R. Carmona,et al.  A probabilistic weak formulation of mean field games and applications , 2013, 1307.1152.

[24]  Francisco J. Silva,et al.  Finite Mean Field Games: Fictitious play and convergence to a first order continuous mean field game , 2018, Journal de Mathématiques Pures et Appliquées.

[25]  Olivier Pironneau,et al.  Dynamic Programming for Mean-Field Type Control , 2014, Journal of Optimization Theory and Applications.

[26]  Samy Wu Fung,et al.  APAC-Net: Alternating the Population and Agent Control via Two Neural Networks to Solve High-Dimensional Stochastic Mean Field Games , 2020, ArXiv.

[27]  Diogo A. Gomes,et al.  Mean Field Games Models—A Brief Survey , 2013, Dynamic Games and Applications.

[28]  A. Bensoussan,et al.  Existence and Uniqueness of Solutions for Bertrand and Cournot Mean Field Games , 2015, 1508.05408.

[29]  Renyuan Xu,et al.  Q-Learning for Mean-Field Controls , 2020, ArXiv.

[30]  Mathieu Laurière,et al.  Convergence Analysis of Machine Learning Algorithms for the Numerical Solution of Mean Field Control and Games: I - The Ergodic Case , 2019, The Annals of Applied Probability.

[31]  Zhaoran Wang,et al.  Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time , 2020, ICML.

[32]  Mathieu Lauriere,et al.  Mean Field Games and Applications: Numerical Aspects , 2020, Lecture Notes in Mathematics.

[33]  Hongyuan Zha,et al.  Deep Mean Field Games for Learning Optimal Behavior Policy of Large Populations , 2017, ICLR 2018.

[34]  Beatrice Acciaio,et al.  Extended Mean Field Control Problems: Stochastic Maximum Principle and Transport Perspective , 2018, SIAM J. Control. Optim..

[35]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[36]  Sean P. Meyn,et al.  Learning in Mean-Field Games , 2014, IEEE Transactions on Automatic Control.

[37]  Mao Fabrice Djete,et al.  McKean–Vlasov optimal control: The dynamic programming principle , 2019, The Annals of Probability.

[38]  Aditya Mahajan,et al.  Reinforcement Learning in Stationary Mean-field Games , 2019, AAMAS.

[39]  Rene Carmona,et al.  Price of anarchy for Mean Field Games , 2018, ESAIM: Proceedings and Surveys.

[40]  M'ed'eric Motte UPD7,et al.  Mean-field Markov decision processes with common noise and open-loop controls , 2019, The Annals of Applied Probability.

[41]  Zhuoran Yang,et al.  Provable Fictitious Play for General Mean-Field Games , 2020, ArXiv.

[42]  Alasseur Clemence,et al.  An Extended Mean Field Game for Storage in Smart Grids , 2017, Journal of Optimization Theory and Applications.

[43]  Romuald Elie,et al.  On the Convergence of Model Free Learning in Mean Field Games , 2020, AAAI.

[44]  Ali Al-Aradi,et al.  Solving Nonlinear and High-Dimensional Partial Differential Equations via Deep Learning , 2018, 1811.08782.

[45]  Huyen Pham,et al.  Discrete Time McKean–Vlasov Control Problem: A Dynamic Programming Approach , 2015, Applied Mathematics & Optimization.

[46]  Jean-Pierre Fouque,et al.  Deep Learning Methods for Mean Field Control Problems With Delay , 2019, Frontiers in Applied Mathematics and Statistics.

[47]  Romuald Elie,et al.  Reinforcement Learning in Economics and Finance , 2020, Computational Economics.

[48]  Enrique Munoz de Cote,et al.  Decentralised Learning in Systems with Many, Many Strategic Agents , 2018, AAAI.

[49]  P. Lions,et al.  Mean field games , 2007 .

[50]  P. Jameson Graber,et al.  Linear Quadratic Mean Field Type Control and Mean Field Games with Common Noise, with Application to Production of an Exhaustible Resource , 2016, 1607.02130.

[51]  Mathieu Lauriere,et al.  Connecting GANs and MFGs , 2020, ArXiv.

[52]  Diogo A. Gomes,et al.  Extended mean field games - formulation, existence, uniqueness and examples , 2013 .

[53]  H. McKean,et al.  A CLASS OF MARKOV PROCESSES ASSOCIATED WITH NONLINEAR PARABOLIC EQUATIONS , 1966, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Matthieu Geist,et al.  Scaling up Mean Field Games with Online Mirror Descent , 2021, ArXiv.

[55]  Ziad Kobeissi,et al.  On classical solutions to the mean field game system of controls , 2019, Communications in Partial Differential Equations.

[56]  Xavier Warin,et al.  Numerical resolution of McKean-Vlasov FBSDEs using neural networks. , 2019 .

[57]  Laurent Pfeiffer,et al.  Schauder Estimates for a Class of Potential Mean Field Games of Controls , 2019, Applied Mathematics & Optimization.

[58]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[59]  Mathieu Lauriere,et al.  Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods , 2019, ArXiv.

[60]  Pierre-Louis Lions,et al.  Some remarks on mean field games , 2018, Communications in Partial Differential Equations.

[61]  Yishay Mansour,et al.  Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[62]  Mathieu Lauriere,et al.  Convergence Analysis of Machine Learning Algorithms for the Numerical Solution of Mean Field Control and Games I: The Ergodic Case , 2019, SIAM J. Numer. Anal..

[63]  Yongxin Chen,et al.  Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games , 2019, ICLR.

[64]  Ronnie Sircar,et al.  Bertrand and Cournot Mean Field Games , 2015 .