论文信息 - Evolutionary Dynamics and Φ-Regret Minimization in Games

Evolutionary Dynamics and Φ-Regret Minimization in Games

Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. Regret quantifies the difference between a learner’s performance against a baseline in hindsight. It is wellknown that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret used in game theory predominantly consider baselines that permit deviations to deterministic actions or strategies. In this paper, we revisit our understanding of regret from the perspective of deviations over partitions of the full mixed strategy space (i.e., probability distributions over pure strategies), under the lens of the previously-established Φ-regret framework, which provides a continuum of stronger regret measures. Importantly, Φ-regret enables learning agents to consider deviations from and to mixed strategies, generalizing several existing notions of regret such as external, internal, and swap regret, and thus broadening the insights gained from regret-based analysis of learning algorithms. We prove here that the well-studied evolutionary learning algorithm of replicator dynamics (RD) seamlessly minimizes the strongest possible form of Φ-regret in generic 2 × 2 games, without any modification of the underlying algorithm itself. We subsequently conduct experiments validating our theoretical results in a suite of 144 2× 2 games wherein RD exhibits a diverse set of behaviors. We conclude by providing empirical evidence of Φ-regret minimization by RD in some larger games, hinting at further opportunity for Φ-regret based study of such algorithms from both a theoretical and empirical perspective.

[1] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.

[2] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[3] A. Rapoport,et al. 2 × 2 games played once , 1972 .

[4] P. Taylor,et al. Evolutionarily Stable Strategies and Game Dynamics , 1978 .

[5] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[6] J. Hofbauer. Evolutionary dynamics for bimatrix games: A Hamiltonian system? , 1996, Journal of mathematical biology.

[7] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .

[8] Josef Hofbauer,et al. Evolutionary Games and Population Dynamics , 1998 .

[9] C. Harris. On the Rate of Convergence of Continuous-Time Fictitious Play , 1998 .

[10] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .

[11] Eizo Akiyama,et al. Chaos in learning a simple two-person game , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12] Tom Lenaerts,et al. A selection-mutation model for q-learning in multi-agent systems , 2003, AAMAS '03.

[13] Amy Greenwald,et al. A General Class of No-Regret Learning Algorithms and Game-Theoretic Equilibria , 2003, COLT.

[14] Ehud Lehrer,et al. A wide range no-regret theorem , 2003, Games Econ. Behav..

[15] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[16] Bikramjit Banerjee,et al. Performance Bounded Reinforcement Learning in Strategic Interactions , 2004, AAAI.

[17] H. Peyton Young,et al. Strategic Learning and Its Limits , 2004 .

[18] Leslie Pack Kaelbling,et al. Hedged learning: regret-minimization with learning experts , 2005, ICML.

[19] Karl Tuyls,et al. An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[20] D. Robinson,et al. The topology of the 2x2 games : a new periodic table , 2005 .

[21] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[22] Yishay Mansour,et al. From External to Internal Regret , 2005, J. Mach. Learn. Res..

[23] Seshadhri Comandur,et al. Electronic Colloquium on Computational Complexity, Report No. 88 (2007) Adaptive Algorithms for Online Decision Problems , 2022 .