Reinforcement Learning in Stationary Mean-field Games

Multi-agent reinforcement learning has made significant progress in recent years, but it remains a hard problem. Hence, one often resorts to developing learning algorithms for specific classes of multi-agent systems. In this paper we study reinforcement learning in a specific class of multi-agent systems systems called mean-field games. In particular, we consider learning in stationary mean-field games. We identify two different solution concepts---stationary mean-field equilibrium and stationary mean-field social-welfare optimal policy---for such games based on whether the agents are non-cooperative or cooperative, respectively. We then generalize these solution concepts to their local variants using bounded rationality based arguments. For these two local solution concepts, we present two reinforcement learning algorithms. We show that the algorithms converge to the right solution under mild technical conditions and demonstrate this using two numerical examples.

[1]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[2]  Hongyuan Zha,et al.  Deep Mean Field Games for Learning Optimal Behavior Policy of Large Populations , 2017, ICLR 2018.

[3]  Yan Ma,et al.  Mean field stochastic games with binary actions: Stationary threshold policies , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[4]  Vijay R. Konda,et al.  OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..

[5]  R. Johari,et al.  Equilibria of Dynamic Games with Many Players: Existence, Approximation, and Market Structure , 2011 .

[6]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2018, Autonomous Agents and Multi-Agent Systems.

[7]  Joseph Y. Halpern,et al.  Multiagent learning in large anonymous games , 2009, AAMAS.

[8]  Aditya Mahajan,et al.  Team optimal control of coupled subsystems with mean-field sharing , 2014, 53rd IEEE Conference on Decision and Control.

[9]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[10]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[11]  Peter W. Glynn,et al.  Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[12]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[13]  Benjamin Van Roy,et al.  MARKOV PERFECT INDUSTRY DYNAMICS WITH MANY FIRMS , 2008 .

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[16]  Reuven Y. Rubinstein,et al.  Sensitivity Analysis and Performance Extrapolation for Computer Simulation Models , 1989, Oper. Res..

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  C. D. Meyer,et al.  Sensitivity of the stationary distribution vector for an ergodic Markov chain , 1986 .

[19]  Huang Minyi,et al.  Mean field stochastic games: Monotone costs and threshold\\ policies , 2016 .

[20]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[21]  Phillipp Meister,et al.  Stochastic Recursive Algorithms For Optimization Simultaneous Perturbation Methods , 2016 .

[22]  Peter E. Caines,et al.  Mean Field Stochastic Adaptive Control , 2012, IEEE Transactions on Automatic Control.

[23]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[24]  S. Shankar Sastry,et al.  On the Characterization of Local Nash Equilibria in Continuous Games , 2014, IEEE Transactions on Automatic Control.

[25]  D. Bernhardt,et al.  Anonymous sequential games: Existence and characterization of equilibria , 1995 .

[26]  Minyi Huang,et al.  Mean Field Stochastic Games with Binary Action Spaces and Monotone Costs , 2017, 1701.06661.

[27]  Pierre Cardaliaguet,et al.  Learning in mean field games: The fictitious play , 2015, 1507.06280.

[28]  G. Rappl On Linear Convergence of a Class of Random Search Algorithms , 1989 .

[29]  David S. Leslie,et al.  Reinforcement learning in games , 2004 .

[30]  Benjamin Van Roy,et al.  Computational Methods for Oblivious Equilibrium , 2010, Oper. Res..

[31]  P. Weiss L'hypothèse du champ moléculaire et la propriété ferromagnétique , 1907 .

[32]  Jean C. Walrand,et al.  How Bad Are Selfish Investments in Network Security? , 2011, IEEE/ACM Transactions on Networking.

[33]  Sean P. Meyn,et al.  Learning in Mean-Field Games , 2014, IEEE Transactions on Automatic Control.

[34]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[35]  Peter E. Caines,et al.  Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle , 2006, Commun. Inf. Syst..

[36]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[37]  Benjamin Van Roy,et al.  Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games , 2005, NIPS.

[38]  Aditya Mahajan,et al.  Renewal Monte Carlo: Renewal Theory Based Reinforcement Learning , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[39]  J. L. Maryak,et al.  Global random optimization by simultaneous perturbation stochastic approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[40]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[41]  V. Borkar Stochastic approximation with two time scales , 1997 .

[42]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[43]  Gerhard Neumann,et al.  Deep Reinforcement Learning for Swarm Systems , 2018, J. Mach. Learn. Res..

[44]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[45]  Enrique Munoz de Cote,et al.  Decentralised Learning in Systems with Many, Many Strategic Agents , 2018, AAAI.

[46]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[47]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .

[48]  R. Rosenthal,et al.  Anonymous sequential games , 1988 .

[49]  P. Lions,et al.  Mean field games , 2007 .