Evolutionary Dynamics and Φ-Regret Minimization in Games

Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. Regret quantifies the difference between a learner’s performance against a baseline in hindsight. It is wellknown that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret used in game theory predominantly consider baselines that permit deviations to deterministic actions or strategies. In this paper, we revisit our understanding of regret from the perspective of deviations over partitions of the full mixed strategy space (i.e., probability distributions over pure strategies), under the lens of the previously-established Φ-regret framework, which provides a continuum of stronger regret measures. Importantly, Φ-regret enables learning agents to consider deviations from and to mixed strategies, generalizing several existing notions of regret such as external, internal, and swap regret, and thus broadening the insights gained from regret-based analysis of learning algorithms. We prove here that the well-studied evolutionary learning algorithm of replicator dynamics (RD) seamlessly minimizes the strongest possible form of Φ-regret in generic 2 × 2 games, without any modification of the underlying algorithm itself. We subsequently conduct experiments validating our theoretical results in a suite of 144 2× 2 games wherein RD exhibits a diverse set of behaviors. We conclude by providing empirical evidence of Φ-regret minimization by RD in some larger games, hinting at further opportunity for Φ-regret based study of such algorithms from both a theoretical and empirical perspective.

[1]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[2]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[3]  A. Rapoport,et al.  2 × 2 games played once , 1972 .

[4]  P. Taylor,et al.  Evolutionarily Stable Strategies and Game Dynamics , 1978 .

[5]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[6]  J. Hofbauer Evolutionary dynamics for bimatrix games: A Hamiltonian system? , 1996, Journal of mathematical biology.

[7]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[8]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .

[9]  C. Harris On the Rate of Convergence of Continuous-Time Fictitious Play , 1998 .

[10]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[11]  Eizo Akiyama,et al.  Chaos in learning a simple two-person game , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Tom Lenaerts,et al.  A selection-mutation model for q-learning in multi-agent systems , 2003, AAMAS '03.

[13]  Amy Greenwald,et al.  A General Class of No-Regret Learning Algorithms and Game-Theoretic Equilibria , 2003, COLT.

[14]  Ehud Lehrer,et al.  A wide range no-regret theorem , 2003, Games Econ. Behav..

[15]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[16]  Bikramjit Banerjee,et al.  Performance Bounded Reinforcement Learning in Strategic Interactions , 2004, AAAI.

[17]  H. Peyton Young,et al.  Strategic Learning and Its Limits , 2004 .

[18]  Leslie Pack Kaelbling,et al.  Hedged learning: regret-minimization with learning experts , 2005, ICML.

[19]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[20]  D. Robinson,et al.  The topology of the 2x2 games : a new periodic table , 2005 .

[21]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[22]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[23]  Seshadhri Comandur,et al.  Electronic Colloquium on Computational Complexity, Report No. 88 (2007) Adaptive Algorithms for Online Decision Problems , 2022 .

[24]  Simon Parsons,et al.  What evolutionary game theory tells us about multiagent learning , 2007, Artif. Intell..

[25]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[26]  Gábor Lugosi,et al.  Learning correlated equilibria in games with compact sets of strategies , 2007, Games Econ. Behav..

[27]  Yu-Han Chang No regrets about no-regret , 2007, Artif. Intell..

[28]  Geoffrey J. Gordon,et al.  No-regret learning in convex games , 2008, ICML '08.

[29]  S. Hong A Practical No-Linear-Regret Algorithm for Convex Games , 2008 .

[30]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[31]  Sylvain Sorin,et al.  Exponential weight algorithm in continuous time , 2008, Math. Program..

[32]  Éva Tardos,et al.  Multiplicative updates outperform generic no-regret learning in congestion games: extended abstract , 2009, STOC '09.

[33]  Michael L. Littman,et al.  Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.

[34]  Karl Tuyls,et al.  Evolutionary Dynamics of Regret Minimization , 2010, ECML/PKDD.

[35]  Tim Roughgarden,et al.  Algorithmic Game Theory , 2007 .

[36]  Tuomas Sandholm,et al.  The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..

[37]  Aram Galstyan,et al.  Continuous strategy replicator dynamics for multi-agent Q-learning , 2009, Autonomous Agents and Multi-Agent Systems.

[38]  T. Roughgarden,et al.  Intrinsic robustness of the price of anarchy , 2012, Commun. ACM.

[39]  Ambuj Tewari,et al.  Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.

[40]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[41]  Mehryar Mohri,et al.  Conditional Swap Regret and Conditional Correlated Equilibrium , 2014, NIPS.

[42]  Henrik I. Christensen,et al.  Persistent patterns: multi-agent learning beyond equilibrium and utility , 2014, AAMAS.

[43]  Jeff S. Shamma,et al.  Optimization Despite Chaos: Convex Relaxations to Complex Limit Sets via Poincaré Recurrence , 2014, SODA.

[44]  Panayotis Mertikopoulos,et al.  A continuous-time approach to online optimization , 2014, Journal of Dynamics & Games.

[45]  Bryan Randolph Bruns,et al.  Names for Games: Locating 2 × 2 Games , 2015, Games.

[46]  Amit Daniely,et al.  Strongly Adaptive Online Learning , 2015, ICML.

[47]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[48]  Christos H. Papadimitriou,et al.  From Nash Equilibria to Chain Recurrent Sets: Solution Concepts and Topology , 2016, ITCS.

[49]  Marcello Restelli,et al.  Sequence-Form and Evolutionary Dynamics: Realization Equivalence to Agent Form and Logit Dynamics , 2016, AAAI.

[50]  D. M. V. Hesteren Evolutionary Game Theory , 2017 .

[51]  Tim Roughgarden,et al.  Twenty Lectures on Algorithmic Game Theory , 2016, Bull. EATCS.

[52]  Georgios Piliouras,et al.  Limits and limitations of no-regret learning in games , 2017, The Knowledge Engineering Review.

[53]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[54]  Georgios Piliouras,et al.  Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos , 2017, NIPS.

[55]  G. Piliouras,et al.  Family of chaotic maps from game theory , 2018, 1807.06831.

[56]  Georgios Piliouras,et al.  Three Body Problems in Evolutionary Game Dynamics: Convergence, Periodicity and Limit Cycles , 2018, AAMAS.

[57]  Michael H. Bowling,et al.  Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.

[58]  Christos H. Papadimitriou,et al.  From Nash Equilibria to Chain Recurrent Sets: An Algorithmic Solution Concept for Game Theory , 2018, Entropy.

[59]  Georgios Piliouras,et al.  Multiplicative Weights Update in Zero-Sum Games , 2018, EC.

[60]  Christos H. Papadimitriou,et al.  Cycles in adversarial regularized learning , 2017, SODA.

[61]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[62]  Tobias Galla,et al.  The prevalence of chaotic dynamics in games with many players , 2016, Scientific Reports.

[63]  Yun Kuen Cheung,et al.  Vortices Instead of Equilibria in MinMax Optimization: Chaos and Butterfly Effects of Online Learning in Zero-Sum Games , 2019, COLT.

[64]  Georgios Piliouras,et al.  Game dynamics as the meaning of a game , 2019, SECO.

[65]  Sriram Srinivasan,et al.  OpenSpiel: A Framework for Reinforcement Learning in Games , 2019, ArXiv.

[66]  Georgios Piliouras,et al.  Fast and Furious Learning in Zero-Sum Games: Vanishing Regret with Non-Vanishing Step Sizes , 2019, NeurIPS.

[67]  Georgios Piliouras,et al.  From Darwin to Poincaré and von Neumann: Recurrence and Cycles in Evolutionary and Algorithmic Game Theory , 2019, WINE.

[68]  Christos H. Papadimitriou,et al.  α-Rank: Multi-Agent Evaluation by Evolution , 2019, Scientific Reports.

[69]  G. Piliouras,et al.  The route to chaos in routing games: When is price of anarchy too optimistic? , 2019, NeurIPS.

[70]  Daniel Hennes,et al.  Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients , 2020, AAMAS.

[71]  Adam Lerer,et al.  Combining Deep Reinforcement Learning and Search for Imperfect-Information Games , 2020, NeurIPS.

[72]  Yun Kuen Cheung,et al.  Chaos, Extremism and Optimism: Volume Analysis of Learning in Games , 2020, NeurIPS.

[73]  N. Gatti,et al.  Simple Uncoupled No-regret Learning Dynamics for Extensive-form Correlated Equilibrium , 2021, J. ACM.

[74]  Georgios Piliouras,et al.  No-regret learning and mixed Nash equilibria: They do not mix , 2020, NeurIPS.

[75]  Georgios Piliouras,et al.  From Chaos to Order: Symmetry and Conservation Laws in Game Dynamics , 2020, ICML.

[76]  S. Sorin Replicator dynamics: Old and new , 2020, Journal of Dynamics & Games.

[77]  Georgios Piliouras,et al.  Evolutionary Game Theory Squared: Evolving Agents in Endogenously Evolving Zero-Sum Games , 2020, AAAI.

[78]  Hindsight and Sequential Rationality of Correlated Play , 2020, AAAI.