Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO

Recent advances in multiagent learning have seen the introduction of a family of algorithms that revolve around the population-based training method PSRO, showing convergence to Nash, correlated and coarse correlated equilibria. Notably, when the number of agents increases, learning best-responses becomes exponentially more difficult, and as such hampers PSRO training methods. The paradigm of mean-field games provides an asymptotic solution to this problem when the considered games are anonymous-symmetric. Unfortunately, the mean-field approximation introduces non-linearities which prevent a straightforward adaptation of PSRO. Building upon optimization and adversarial regret minimization, this paper sidesteps this issue and introduces mean-field PSRO, an adaptation of PSRO which learns Nash, coarse correlated and correlated equilibria in mean-field games. The key is to replace the exact distribution computation step by newly-defined mean-field no-adversarial-regret learners, or by black-box optimization. We compare the asymptotic complexity of the approach to standard PSRO, greatly improve empirical bandit convergence speed by compressing temporal mixture weights, and ensure it is theoretically robust to payoff noise. Finally, we illustrate the speed and accuracy of mean-field PSRO on several mean-field games, demonstrating convergence to strong and weak equilibria.

[1]  Francisco J. Silva,et al.  Finite Mean Field Games: Fictitious play and convergence to a first order continuous mean field game , 2018, Journal de Mathématiques Pures et Appliquées.

[2]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[3]  Tim Roughgarden,et al.  Algorithmic Game Theory , 2007 .

[4]  Romuald Elie,et al.  On the Convergence of Model Free Learning in Mean Field Games , 2020, AAAI.

[5]  Laura Degl’Innocenti Correlated equilibria in static mean-field games , 2018 .

[6]  Thore Graepel,et al.  Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers , 2021, ICML.

[7]  P. Lions,et al.  Mean field games , 2007 .

[8]  Vijay Kumar,et al.  Scalable Centralized Deep Multi-Agent Reinforcement Learning via Policy Gradients , 2018, ArXiv.

[9]  Zhuoran Yang,et al.  Provable Fictitious Play for General Mean-Field Games , 2020, ArXiv.

[10]  Georgios Piliouras,et al.  Limits and limitations of no-regret learning in games , 2017, The Knowledge Engineering Review.

[11]  Christos H. Papadimitriou,et al.  α-Rank: Multi-Agent Evaluation by Evolution , 2019, Scientific Reports.

[12]  Christos H. Papadimitriou,et al.  Discretized Multinomial Distributions and Nash Equilibria in Anonymous Games , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[13]  S. Hart,et al.  Simple Adaptive Strategies: From Regret-matching To Uncoupled Dynamics , 2013 .

[14]  Peter E. Caines,et al.  Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle , 2006, Commun. Inf. Syst..

[15]  Katrina Ligett,et al.  Finding any nontrivial coarse correlated equilibrium is hard , 2015, SECO.

[16]  Renyuan Xu,et al.  A General Framework for Learning Mean-Field Games , 2020, ArXiv.

[17]  Guy Lever,et al.  A Generalized Training Approach for Multiagent Learning , 2020, ICLR.

[18]  Roger J.-B. Wets,et al.  Minimization by Random Search Techniques , 1981, Math. Oper. Res..

[19]  Luciano Campi,et al.  Correlated Equilibria and Mean Field Games: A Simple Model , 2020, Math. Oper. Res..

[20]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[21]  Romuald Elie,et al.  Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications , 2020, NeurIPS.

[22]  Avrim Blum,et al.  Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[23]  Oskari Tammelin,et al.  Solving Large Imperfect Information Games Using CFR+ , 2014, ArXiv.

[24]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[25]  S. Kakutani A generalization of Brouwer’s fixed point theorem , 1941 .

[26]  Matthieu Geist,et al.  Scaling up Mean Field Games with Online Mirror Descent , 2021, ArXiv.

[27]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[28]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.

[29]  Afshin Oroojlooyjadid,et al.  A review of cooperative multi-agent deep reinforcement learning , 2019, Applied Intelligence.

[30]  Sriram Srinivasan,et al.  OpenSpiel: A Framework for Reinforcement Learning in Games , 2019, ArXiv.

[31]  Naci Saldi,et al.  Q-Learning in Regularized Mean-field Games , 2020, ArXiv.

[32]  Pierre Cardaliaguet,et al.  Learning in mean field games: The fictitious play , 2015, 1507.06280.

[33]  Christos H. Papadimitriou,et al.  Computing Equilibria in Anonymous Games , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[34]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[35]  Michal Valko,et al.  Multiagent Evaluation under Incomplete Information , 2019, NeurIPS.

[36]  Hoong Chuin Lau,et al.  Entropy Based Independent Learning in Anonymous Multi-Agent Settings , 2018, ICAPS.