A General Framework for Learning Mean-Field Games

This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision making in stochastic games with a large population. It first establishes the existence of a unique Nash equilibrium to this GMFG, and it demonstrates that naively combining reinforcement learning with the fixed-point approach in classical mean-field games yields unstable algorithms. It then proposes value-based and policy-based reinforcement learning algorithms (GMF-V and GMF-P, respectively) with smoothed policies, with analysis of their convergence properties and computational complexities. Experiments on an equilibrium product pricing problem demonstrate that two specific instantiations of GMF-V with Q-learning and GMF-P with trust region policy optimization—GMF-V-Q and GMF-P-TRPO, respectively—are both efficient and robust in the GMFG setting. Moreover, their performance is superior in convergence speed, accuracy, and stability when compared with existing algorithms for multiagent reinforcement learning in the N-player setting.

[1]  Marcello Restelli,et al.  Dealer markets: A reinforcement learning mean field game approach , 2023, The North American Journal of Economics and Finance.

[2]  Andreas Krause,et al.  Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning , 2021, Trans. Mach. Learn. Res..

[3]  Athanasios Vasileiadis,et al.  Exploration noise for learning linear-quadratic mean field games , 2021, ArXiv.

[4]  Mathieu Lauriere,et al.  Reinforcement Learning for Mean Field Games, with Applications to Economics , 2021, ArXiv.

[5]  R. Munos,et al.  Concave Utility Reinforcement Learning: the Mean-field Game viewpoint , 2021, AAMAS.

[6]  Matthieu Geist,et al.  Mean Field Games Flock! The Reinforcement Learning Way , 2021, IJCAI.

[7]  Heinz Koeppl,et al.  Discrete-Time Mean Field Control with Environment States , 2021, 2021 60th IEEE Conference on Decision and Control (CDC).

[8]  Matthieu Geist,et al.  Scaling up Mean Field Games with Online Mirror Descent , 2021, ArXiv.

[9]  H. Koeppl,et al.  Approximately Solving Mean Field Games via Entropy-Regularized Deep Reinforcement Learning , 2021, International Conference on Artificial Intelligence and Statistics.

[10]  Gershon Wolansky,et al.  Optimal Transport , 2021 .

[11]  Pascal Poupart,et al.  Partially Observable Mean Field Reinforcement Learning , 2020, AAMAS.

[12]  Arnob Ghosh,et al.  Model Free Reinforcement Learning Algorithm for Stationary Mean field Equilibrium for Multiple Types of Agents , 2020, ArXiv.

[13]  Ashutosh Nayyar,et al.  Thompson sampling for linear quadratic mean-field teams , 2020, 2021 60th IEEE Conference on Decision and Control (CDC).

[14]  Zhuoran Yang,et al.  Provable Fictitious Play for General Mean-Field Games , 2020, ArXiv.

[15]  Xin Guo,et al.  Entropy Regularization for Mean Field Games with Learning , 2020, Math. Oper. Res..

[16]  Tamer Başar,et al.  Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic Mean-Field Games , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[17]  Zhaoran Wang,et al.  Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time , 2020, ICML.

[18]  Romuald Elie,et al.  Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications , 2020, NeurIPS.

[19]  Shuyue Hu,et al.  The Evolutionary Dynamics of Independent Learning Agents in Population Games , 2020, ArXiv.

[20]  J. Fouque,et al.  Unified reinforcement Q-learning for mean field game and control problems , 2020, Mathematics of Control, Signals, and Systems.

[21]  Zhuoran Yang,et al.  Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning , 2020, ICML.

[22]  Csaba Szepesvari,et al.  On the Global Convergence Rates of Softmax Policy Gradient Methods , 2020, ICML.

[23]  Sriram Vishwanath,et al.  Model-free Reinforcement Learning for Non-stationary Mean Field Games , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[24]  Tamer Basar,et al.  Approximate Equilibrium Computation for Discrete-Time Linear-Quadratic Mean-Field Games , 2020, 2020 American Control Conference (ACC).

[25]  Can Deha Kariksiz,et al.  Q-Learning in Regularized Mean-field Games , 2020, Dynamic Games and Applications.

[26]  Shie Mannor,et al.  Distributional Robustness and Regularization in Reinforcement Learning , 2020, ArXiv.

[27]  Renyuan Xu,et al.  Mean-Field Controls with Q-Learning for Cooperative MARL: Convergence and Complexity Analysis , 2020, SIAM J. Math. Data Sci..

[28]  Renyuan Xu,et al.  Q-Learning for Mean-Field Controls , 2020, ArXiv.

[29]  Matthew E. Taylor,et al.  Multi Type Mean Field Reinforcement Learning , 2020, AAMAS.

[30]  Naci Saldi,et al.  Fitted Q-Learning in Mean-field Games , 2019, ArXiv.

[31]  M. Kolar,et al.  Natural Actor-Critic Converges Globally for Hierarchical Linear Quadratic Regulator , 2019, ArXiv.

[32]  Renyuan Xu,et al.  Dynamic Programming Principles for Learning MFCs , 2019 .

[33]  R. Carmona,et al.  Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning , 2019, The Annals of Applied Probability.

[34]  Yongxin Chen,et al.  Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games , 2019, ICLR.

[35]  Mathieu Lauriere,et al.  Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods , 2019, ArXiv.

[36]  Shie Mannor,et al.  Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs , 2019, AAAI.

[37]  Zhaoran Wang,et al.  Neural Policy Gradient Methods: Global Optimality and Rates of Convergence , 2019, ICLR.

[38]  Sham M. Kakade,et al.  On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..

[39]  O. Pietquin,et al.  On the Convergence of Model Free Learning in Mean Field Games , 2019, AAAI.

[40]  J. Pérolat,et al.  Approximate Fictitious Play for Mean Field Games , 2019, ArXiv.

[41]  Aditya Mahajan,et al.  Reinforcement Learning in Stationary Mean-field Games , 2019, AAMAS.

[42]  Matthieu Geist,et al.  A Theory of Regularized Markov Decision Processes , 2019, ICML.

[43]  Charafeddine Mouzouni,et al.  A Mean Field Game Of Portfolio Trading And Its Consequences On Perceived Correlations , 2019, 1902.09606.

[44]  Renyuan Xu,et al.  Learning Mean-Field Games , 2019, NeurIPS.

[45]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2018, Autonomous Agents and Multi-Agent Systems.

[46]  Sanyam Kapoor,et al.  Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches , 2018, ArXiv.

[47]  Zhuoran Yang,et al.  Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization , 2018, NeurIPS.

[48]  Olivier Pietquin,et al.  Actor-Critic Fictitious Play in Simultaneous Move Multistage Games , 2018, AISTATS.

[49]  Enrique Munoz de Cote,et al.  Decentralised Learning in Systems with Many, Many Strategic Agents , 2018, AAAI.

[50]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[51]  Weinan Zhang,et al.  Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising , 2018, CIKM.

[52]  Beatrice Acciaio,et al.  Extended Mean Field Control Problems: Stochastic Maximum Principle and Transport Perspective , 2018, SIAM J. Control. Optim..

[53]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[54]  W. Zhang In discrete Time , 2017 .

[55]  Hongyuan Zha,et al.  Deep Mean Field Games for Learning Optimal Behavior Policy of Large Populations , 2017, ICLR 2018.

[56]  Vicenç Gómez,et al.  A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.

[57]  Bolin Gao,et al.  On the Properties of the Softmax Function with Application in Game Theory and Reinforcement Learning , 2017, ArXiv.

[58]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[59]  Minyi Huang,et al.  Mean Field Stochastic Games with Binary Action Spaces and Monotone Costs , 2017, 1701.06661.

[60]  Jun Wang,et al.  Real-Time Bidding by Reinforcement Learning in Display Advertising , 2017, WSDM.

[61]  Tamer Basar,et al.  Markov-Nash equilibria in mean-field games with discounted cost , 2016, 2017 American Control Conference (ACC).

[62]  Kavosh Asadi,et al.  An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.

[63]  Juho Hamari,et al.  The sharing economy: Why people participate in collaborative consumption , 2016, J. Assoc. Inf. Sci. Technol..

[64]  Olivier Pietquin,et al.  Learning Nash Equilibrium for General-Sum Markov Games from Batch Data , 2016, AISTATS.

[65]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[66]  Marc G. Bellemare,et al.  Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.

[67]  Ah Reum Kang,et al.  Analysis of Game Bot's Behavioral Characteristics in Social Interaction Networks of MMORPG , 2015, Comput. Commun. Rev..

[68]  Juan Pablo Maldonado López Discrete time mean field games: The short-stage limit , 2015 .

[69]  Michael I. Jordan,et al.  Trust Region Policy Optimization , 2015, ICML.

[70]  Daniel Lacker,et al.  Mean field games via controlled martingale problems: Existence of Markovian equilibria , 2014, 1404.2642.

[71]  Sean P. Meyn,et al.  Learning in Mean-Field Games , 2014, IEEE Transactions on Automatic Control.

[72]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[73]  Peter E. Caines,et al.  Mean Field Stochastic Adaptive Control , 2012, IEEE Transactions on Automatic Control.

[74]  Mukund Sundararajan,et al.  Mean field equilibria of dynamic auctions with learning , 2011, SECO.

[75]  D. Gomes,et al.  Discrete Time, Finite State Space Mean Field Games , 2010 .

[76]  M. Benaïm,et al.  A class of mean field interaction models for computer and communication systems , 2008, 2008 6th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks and Workshops.

[77]  P. Lions,et al.  Mean field games , 2007 .

[78]  Peter E. Caines,et al.  Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle , 2006, Commun. Inf. Syst..

[79]  J. M. Griffin,et al.  Regional Differences in the Price-Elasticity of Demand For Energy , 2005 .

[80]  Tim Roughgarden,et al.  Computing equilibria in multi-player games , 2005, SODA '05.

[81]  Yishay Mansour,et al.  Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[82]  Yishay Mansour,et al.  Auctions with Budget Constraints , 2004, SWAT.

[83]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[84]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[85]  Qiaomin Xie,et al.  Learning While Playing in Mean-Field Games: Convergence and Optimality , 2021, ICML.

[86]  Bakhadyr Khoussainov,et al.  Maximum Entropy Inverse Reinforcement Learning for Mean Field Games , 2021, ArXiv.

[87]  Srinivas Shakkottai,et al.  Reinforcement Learning for Mean Field Games with Strategic Complementarities , 2021, AISTATS.

[88]  Jayakumar Subramanian Reinforcement learning for mean-field teams , 2019 .

[89]  A. Proutière,et al.  Repeated Auctions under Budget Constraints : Optimal bidding strategies and Equilibria , 2012 .

[90]  Olivier Guéant,et al.  Mean Field Games and Applications , 2011 .

[91]  F. Bolley Separability and completeness for the Wasserstein distance , 2008 .