Simple Uncoupled No-regret Learning Dynamics for Extensive-form Correlated Equilibrium

The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation possesses significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium, though it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this article, we give the first uncoupled no-regret dynamics that converge with high probability to the set of EFCEs in =-player general-sum extensive-form games with perfect recall. First, we introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games. When each player has low trigger regret, the empirical frequency of play is close to an EFCE. Then, we give an efficient no-regret algorithm which guarantees with high probability that trigger regrets grow sublinearly in the number of iterations.

[1]  Miroslav Dudík,et al.  A Sampling-Based Approach to Computing Equilibria in Succinct Extensive-Form Games , 2009, UAI.

[2]  Andreu Mas-Colell,et al.  A General Class of Adaptive Strategies , 1999, J. Econ. Theory.

[3]  D. Koller,et al.  Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[4]  Michael Bowling,et al.  Hindsight and Sequential Rationality of Correlated Play , 2021, AAAI.

[5]  Tim Roughgarden,et al.  How bad is selfish routing? , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[6]  Tuomas Sandholm,et al.  Polynomial-Time Computation of Optimal Correlated Equilibria in Two-Player Extensive-Form Games with Public Chance Moves and Beyond , 2020, NeurIPS.

[7]  D. Fudenberg,et al.  Conditional Universal Consistency , 1999 .

[8]  Gábor Lugosi,et al.  Internal Regret in On-Line Portfolio Selection , 2005, Machine Learning.

[9]  D. Fudenberg,et al.  Consistency and Cautious Fictitious Play , 1995 .

[10]  Tuomas Sandholm,et al.  Ex ante coordination and collusion in zero-sum multi-player extensive-form games , 2018, NeurIPS.

[11]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[12]  Bernhard von Stengel,et al.  Extensive-Form Correlated Equilibrium: Definition and Computational Complexity , 2008, Math. Oper. Res..

[13]  Tuomas Sandholm,et al.  Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks , 2019, NeurIPS.

[14]  J. Vial,et al.  Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon , 1978 .

[15]  Christos H. Papadimitriou,et al.  Computing correlated equilibria in multi-player games , 2005, STOC '05.

[16]  Tuomas Sandholm,et al.  Efficient Regret Minimization Algorithm for Extensive-Form Correlated Equilibrium , 2019, NeurIPS.

[17]  Michael Bowling,et al.  Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games , 2021, ICML.

[18]  Amy Greenwald,et al.  A General Class of No-Regret Learning Algorithms and Game-Theoretic Equilibria , 2003, COLT.

[19]  Xiaotie Deng,et al.  Settling the Complexity of Two-Player Nash Equilibrium , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[20]  Jacob D. Abernethy,et al.  Beating the adaptive bandit with high probability , 2009, 2009 Information Theory and Applications Workshop.

[21]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[22]  Bernhard von Stengel,et al.  Computing an Extensive-Form Correlated Equilibrium in Polynomial Time , 2008, WINE.

[23]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[24]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[25]  Nicola Gatti,et al.  Learning to Correlate in Multi-Player General-Sum Sequential Games , 2019, NeurIPS.

[26]  Tuomas Sandholm,et al.  Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games , 2018, AAAI.

[27]  Tuomas Sandholm,et al.  Solving Imperfect-Information Games via Discounted Regret Minimization , 2018, AAAI.

[28]  Amotz Cahn,et al.  General procedures leading to correlated equilibria , 2004, Int. J. Game Theory.

[29]  John Langford,et al.  Correlated equilibria in graphical games , 2003, EC '03.

[30]  Dean P. Foster,et al.  Calibrated Learning and Correlated Equilibrium , 1997 .

[31]  Stefano Coniglio,et al.  Computing Optimal Ex Ante Correlated Equilibria in Two-Player Sequential Games , 2019, AAMAS.

[32]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[33]  Michael H. Bowling,et al.  Solving Heads-Up Limit Texas Hold'em , 2015, IJCAI.

[34]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[35]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[36]  Kevin Waugh,et al.  Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[37]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[39]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[40]  Tuomas Sandholm,et al.  Regret Circuits: Composability of Regret Minimizers , 2018, ICML.

[41]  Thomas P. Hayes,et al.  High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.

[42]  Paul W. Goldberg,et al.  The complexity of computing a Nash equilibrium , 2006, STOC '06.

[43]  C. Paige,et al.  Computation of the stationary distribution of a markov chain , 1975 .

[44]  Kevin Leyton-Brown,et al.  Polynomial-time computation of exact correlated equilibrium in compact games , 2010, EC '11.

[45]  B. Stengel,et al.  Efficient Computation of Behavior Strategies , 1996 .

[46]  Gábor Lugosi,et al.  Learning correlated equilibria in games with compact sets of strategies , 2007, Games Econ. Behav..

[47]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[48]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[49]  Geoffrey J. Gordon,et al.  No-regret learning in convex games , 2008, ICML '08.

[50]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[51]  Christos H. Papadimitriou,et al.  Worst-case Equilibria , 1999, STACS.

[52]  S. Ross GOOFSPIEL -- THE GAME OF PURE STRATEGY , 1971 .

[53]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[54]  Oskari Tammelin,et al.  Solving Large Imperfect Information Games Using CFR+ , 2014, ArXiv.

[55]  H. W. Kuhn,et al.  11. Extensive Games and the Problem of Information , 1953 .