For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria

Although it has been known since the 1970s that a globally optimal strategy profile in a common-payoff game is a Nash equilibrium, global optimality is a strict requirement that limits the re-sult’s applicability. In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium. Furthermore, we show that this result is robust to perturbations to the common payoff and to the local optimum. Applied to machine learning, our result provides a global guarantee for any gradient method that finds a local optimum in symmetric strategy space. While this result indicates stability to unilateral deviation, we nevertheless identify broad classes of games where mixed local optima are unstable under joint , asymmetric deviations. We analyze the prevalence of instability by running learning algorithms in a suite of symmetric games, and we conclude by discussing the applicability of our results to multi-agent RL, cooperative inverse RL, and decentralized POMDPs.

[1]  Asaf Plan,et al.  Symmetry in n-player games , 2022, J. Econ. Theory.

[2]  Nicola Gatti,et al.  Public Information Representation for Adversarial Team Games , 2022, ArXiv.

[3]  Michael Dennis,et al.  A New Formalism, Method and Open Issues for Zero-Shot Coordination , 2021, ICML.

[4]  Gillian K. Hadfield,et al.  Cooperative AI: machines must learn to find common ground , 2021, Nature.

[5]  H. W. Kuhn EXTENSIVE GAMES AND THE PROBLEM OF INFORMATION , 2020, Classics in Game Theory.

[6]  Jaime Fern'andez del R'io,et al.  Array programming with NumPy , 2020, Nature.

[7]  Michael P. Wellman,et al.  Structure Learning for Approximate Solution of Many-Player Games , 2020, AAAI.

[8]  Jakob N. Foerster,et al.  "Other-Play" for Zero-Shot Coordination , 2020, ICML.

[9]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[10]  S. Levine,et al.  RoboNet: Large-Scale Multi-Robot Learning , 2019, Conference on Robot Learning.

[11]  Stuart Russell Human Compatible: Artificial Intelligence and the Problem of Control , 2019 .

[12]  Nicola Gatti,et al.  Computational Results for Extensive-Form Adversarial Team Games , 2017, AAAI.

[13]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[14]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[15]  Andy R. Terrel,et al.  SymPy: Symbolic computing in Python , 2017, PeerJ Prepr..

[16]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[17]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[18]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[19]  A. Sandberg,et al.  The Unilateralist’s Curse and the Case for a Principle of Conformity , 2016, Social epistemology.

[20]  S. Shankar Sastry,et al.  On the Characterization of Local Nash Equilibria in Continuous Games , 2014, IEEE Transactions on Automatic Control.

[21]  Wolfgang Schwarz,et al.  Lost memories and useless coins: revisiting the absentminded driver , 2015, Synthese.

[22]  Nicholas Ham,et al.  Notions of Symmetry for Finite Strategic-Form Games , 2013 .

[23]  Nicolas Markey,et al.  Symmetric Nash Equilibria , 2012 .

[24]  I. Milchtaich Static Stability in Symmetric and Population Games , 2011 .

[25]  Sarit Kraus,et al.  Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[26]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[27]  Takashi Ui Bayesian potentials and information structures: Team decision problems revisited , 2009 .

[28]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[29]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[30]  Christos H. Papadimitriou,et al.  Computing Equilibria in Anonymous Games , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[31]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[32]  B. Stengel Algorithmic Game Theory: Equilibrium Computation for Two-Player Games in Strategic and Extensive Form , 2007 .

[33]  Felix A. Fischer,et al.  Symmetries and the complexity of pure Nash equilibrium , 2007, J. Comput. Syst. Sci..

[34]  Yoav Shoham,et al.  Run the GAMUT: a comprehensive approach to evaluating game-theoretic algorithms , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[35]  Bernard Manderick,et al.  Extended Replicator Dynamics as a Key to Reinforcement Learning in Multi-agent Systems , 2003, ECML.

[36]  William H. Sandholm,et al.  Potential Games with Continuous Player Sets , 2001, J. Econ. Theory.

[37]  E. Damme,et al.  Non-Cooperative Games , 2000 .

[38]  P. Reny On the Existence of Pure and Mixed Strategy Nash Equilibria in Discontinuous Games , 1999 .

[39]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[40]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[41]  B. Stengel,et al.  Team-Maxmin Equilibria☆ , 1997 .

[42]  Ariel Rubinstein,et al.  On the Interpretation of Decision Problems with Imperfect Recall , 1996, TARK.

[43]  Sergiu Hart,et al.  The Absent-Minded Driver , 1996, TARK.

[44]  I. Introduction CAN THE MAXIMIN PRINCIPLE SERVE AS A BASIS FOR MORALITY? A CRITIQUE OF JOHN RAWLS'S THEORY*§ , 1980 .

[45]  R. Radner,et al.  Economic theory of teams , 1972 .

[46]  J. Rawls,et al.  A Theory of Justice , 1971, Princeton Readings in Political Thought.

[47]  Robert J . Aumann,et al.  28. Mixed and Behavior Strategies in Infinite Extensive Games , 1964 .

[48]  J. Marschak,et al.  Elements for a Theory of Teams , 1955 .

[49]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.