Evolutionary Dynamics and Φ-Regret Minimization in Games

Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. Regret quantifies the difference between a learner’s performance against a baseline in hindsight. It is wellknown that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret used in game theory predominantly consider baselines that permit deviations to deterministic actions or strategies. In this paper, we revisit our understanding of regret from the perspective of deviations over partitions of the full mixed strategy space (i.e., probability distributions over pure strategies), under the lens of the previously-established Φ-regret framework, which provides a continuum of stronger regret measures. Importantly, Φ-regret enables learning agents to consider deviations from and to mixed strategies, generalizing several existing notions of regret such as external, internal, and swap regret, and thus broadening the insights gained from regret-based analysis of learning algorithms. We prove here that the well-studied evolutionary learning algorithm of replicator dynamics (RD) seamlessly minimizes the strongest possible form of Φ-regret in generic 2 × 2 games, without any modification of the underlying algorithm itself. We subsequently conduct experiments validating our theoretical results in a suite of 144 2× 2 games wherein RD exhibits a diverse set of behaviors. We conclude by providing empirical evidence of Φ-regret minimization by RD in some larger games, hinting at further opportunity for Φ-regret based study of such algorithms from both a theoretical and empirical perspective.

[1]  Tuomas Sandholm,et al.  The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..

[2]  Georgios Piliouras,et al.  Multiplicative Weights Update in Zero-Sum Games , 2018, EC.

[3]  J. Hofbauer Evolutionary dynamics for bimatrix games: A Hamiltonian system? , 1996, Journal of mathematical biology.

[4]  Bryan Randolph Bruns,et al.  Names for Games: Locating 2 × 2 Games , 2015, Games.

[5]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[6]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[7]  D. Robinson,et al.  The topology of the 2x2 games : a new periodic table , 2005 .

[8]  Tim Roughgarden,et al.  Algorithmic Game Theory , 2007 .

[9]  Georgios Piliouras,et al.  No-regret learning and mixed Nash equilibria: They do not mix , 2020, NeurIPS.

[10]  Yu-Han Chang No regrets about no-regret , 2007, Artif. Intell..

[11]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[12]  Christos H. Papadimitriou,et al.  α-Rank: Multi-Agent Evaluation by Evolution , 2019, Scientific Reports.

[13]  Aram Galstyan,et al.  Continuous strategy replicator dynamics for multi-agent Q-learning , 2009, Autonomous Agents and Multi-Agent Systems.

[14]  Panayotis Mertikopoulos,et al.  A continuous-time approach to online optimization , 2014, Journal of Dynamics & Games.

[15]  Georgios Piliouras,et al.  Fast and Furious Learning in Zero-Sum Games: Vanishing Regret with Non-Vanishing Step Sizes , 2019, NeurIPS.

[16]  G. Piliouras,et al.  Family of chaotic maps from game theory , 2018, 1807.06831.

[17]  Nicola Gatti,et al.  Simple Uncoupled No-regret Learning Dynamics for Extensive-form Correlated Equilibrium , 2020, J. ACM.

[18]  Tom Lenaerts,et al.  A selection-mutation model for q-learning in multi-agent systems , 2003, AAMAS '03.

[19]  A. Rapoport,et al.  2 × 2 games played once , 1972 .

[20]  Georgios Piliouras,et al.  Evolutionary Game Theory Squared: Evolving Agents in Endogenously Evolving Zero-Sum Games , 2020, AAAI.

[21]  Henrik I. Christensen,et al.  Persistent patterns: multi-agent learning beyond equilibrium and utility , 2014, AAMAS.

[22]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[23]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .

[24]  Sriram Srinivasan,et al.  OpenSpiel: A Framework for Reinforcement Learning in Games , 2019, ArXiv.

[25]  Mehryar Mohri,et al.  Conditional Swap Regret and Conditional Correlated Equilibrium , 2014, NIPS.

[26]  Ehud Lehrer,et al.  A wide range no-regret theorem , 2003, Games Econ. Behav..

[27]  Bikramjit Banerjee,et al.  Performance Bounded Reinforcement Learning in Strategic Interactions , 2004, AAAI.

[28]  Tobias Galla,et al.  The prevalence of chaotic dynamics in games with many players , 2016, Scientific Reports.

[29]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[30]  P. Taylor,et al.  Evolutionarily Stable Strategies and Game Dynamics , 1978 .

[31]  Christos H. Papadimitriou,et al.  From Nash Equilibria to Chain Recurrent Sets: An Algorithmic Solution Concept for Game Theory , 2018, Entropy.

[32]  Yun Kuen Cheung,et al.  Chaos, Extremism and Optimism: Volume Analysis of Learning in Games , 2020, NeurIPS.

[33]  Eizo Akiyama,et al.  Chaos in learning a simple two-person game , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Christos H. Papadimitriou,et al.  Cycles in adversarial regularized learning , 2017, SODA.

[35]  Georgios Piliouras,et al.  The route to chaos in routing games: When is price of anarchy too optimistic? , 2019, NeurIPS.

[36]  Amit Daniely,et al.  Strongly Adaptive Online Learning , 2015, ICML.

[37]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[38]  Michael Bowling,et al.  Hindsight and Sequential Rationality of Correlated Play , 2021, AAAI.

[39]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[40]  Adam Lerer,et al.  Combining Deep Reinforcement Learning and Search for Imperfect-Information Games , 2020, NeurIPS.

[41]  Tim Roughgarden,et al.  Twenty Lectures on Algorithmic Game Theory , 2016, Bull. EATCS.

[42]  C. Harris On the Rate of Convergence of Continuous-Time Fictitious Play , 1998 .

[43]  Georgios Piliouras,et al.  Three Body Problems in Evolutionary Game Dynamics: Convergence, Periodicity and Limit Cycles , 2018, AAMAS.

[44]  Gábor Lugosi,et al.  Learning correlated equilibria in games with compact sets of strategies , 2007, Games Econ. Behav..

[45]  Daniel Hennes,et al.  Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients , 2020, AAMAS.

[46]  Georgios Piliouras,et al.  Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos , 2017, NIPS.

[47]  Simon Parsons,et al.  What evolutionary game theory tells us about multiagent learning , 2007, Artif. Intell..

[48]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[49]  Michael L. Littman,et al.  Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.

[50]  Sylvain Sorin,et al.  Exponential weight algorithm in continuous time , 2008, Math. Program..

[51]  Amy Greenwald,et al.  A General Class of No-Regret Learning Algorithms and Game-Theoretic Equilibria , 2003, COLT.

[52]  Georgios Piliouras,et al.  From Darwin to Poincaré and von Neumann: Recurrence and Cycles in Evolutionary and Algorithmic Game Theory , 2019, WINE.

[53]  Georgios Piliouras,et al.  Limits and limitations of no-regret learning in games , 2017, The Knowledge Engineering Review.

[54]  Marcello Restelli,et al.  Sequence-Form and Evolutionary Dynamics: Realization Equivalence to Agent Form and Logit Dynamics , 2016, AAAI.

[55]  Karl Tuyls,et al.  Evolutionary Dynamics of Regret Minimization , 2010, ECML/PKDD.

[56]  Yun Kuen Cheung,et al.  Vortices Instead of Equilibria in MinMax Optimization: Chaos and Butterfly Effects of Online Learning in Zero-Sum Games , 2019, COLT.

[57]  Geoffrey J. Gordon,et al.  No-regret learning in convex games , 2008, ICML '08.

[58]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[59]  Michael H. Bowling,et al.  Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.

[60]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[61]  Ambuj Tewari,et al.  Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.

[62]  Seshadhri Comandur,et al.  Adaptive Algorithms for Online Decision Problems , 2007, Electron. Colloquium Comput. Complex..

[63]  Georgios Piliouras,et al.  From Chaos to Order: Symmetry and Conservation Laws in Game Dynamics , 2020, ICML.

[64]  Jeff S. Shamma,et al.  Optimization Despite Chaos: Convex Relaxations to Complex Limit Sets via Poincaré Recurrence , 2014, SODA.

[65]  Leslie Pack Kaelbling,et al.  Hedged learning: regret-minimization with learning experts , 2005, ICML.

[66]  S. Hong A Practical No-Linear-Regret Algorithm for Convex Games , 2008 .

[67]  H. Peyton Young,et al.  Strategic Learning and Its Limits , 2004 .

[68]  Éva Tardos,et al.  Multiplicative updates outperform generic no-regret learning in congestion games: extended abstract , 2009, STOC '09.

[69]  Georgios Piliouras,et al.  Game dynamics as the meaning of a game , 2019, SECO.

[70]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[71]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[72]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[73]  D. M. V. Hesteren Evolutionary Game Theory , 2017 .

[74]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[75]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[76]  S. Sorin Replicator dynamics: Old and new , 2020, Journal of Dynamics & Games.

[77]  Christos H. Papadimitriou,et al.  From Nash Equilibria to Chain Recurrent Sets: Solution Concepts and Topology , 2016, ITCS.