No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Recently, there has been growing interest around less-restrictive solution concepts than Nash equilibrium in extensive-form games, with significant effort towards the computation of extensive-form correlated equilibrium (EFCE) and extensive-form coarse correlated equilibrium (EFCCE). In this paper, we show how to leverage the popular counterfactual regret minimization (CFR) paradigm to induce simple no-regret dynamics that converge to the set of EFCEs and EFCCEs in an n-player general-sum extensive-form games. For EFCE, we define a notion of internal regret suitable for extensive-form games and exhibit an efficient no-internal-regret algorithm. These results complement those for normal-form games introduced in the seminal paper by Hart and Mas-Colell. For EFCCE, we show that no modification of CFR is needed, and that in fact the empirical frequency of play generated when all the players use the original CFR algorithm converges to the set of EFCCEs.

[1]  Miroslav Dudík,et al.  A Sampling-Based Approach to Computing Equilibria in Succinct Extensive-Form Games , 2009, UAI.

[2]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[3]  Christos H. Papadimitriou,et al.  Computing correlated equilibria in multi-player games , 2005, STOC '05.

[4]  Nicola Gatti,et al.  Learning to Correlate in Multi-Player General-Sum Sequential Games , 2019, NeurIPS.

[5]  Tim Roughgarden,et al.  How bad is selfish routing? , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[6]  Geoffrey J. Gordon,et al.  No-regret learning in convex games , 2008, ICML '08.

[7]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[8]  Bernhard von Stengel,et al.  Extensive-Form Correlated Equilibrium: Definition and Computational Complexity , 2008, Math. Oper. Res..

[9]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[10]  Paul W. Goldberg,et al.  The complexity of computing a Nash equilibrium , 2006, STOC '06.

[11]  Tuomas Sandholm,et al.  Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks , 2019, NeurIPS.

[12]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[13]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Xiaotie Deng,et al.  Settling the Complexity of Two-Player Nash Equilibrium , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[15]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[16]  Tuomas Sandholm,et al.  Efficient Regret Minimization Algorithm for Extensive-Form Correlated Equilibrium , 2019, NeurIPS.

[17]  Stefano Coniglio,et al.  Computing Optimal Ex Ante Correlated Equilibria in Two-Player Sequential Games , 2019, AAMAS.

[18]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[19]  Christos H. Papadimitriou,et al.  Worst-case Equilibria , 1999, STACS.

[20]  S. Ross GOOFSPIEL -- THE GAME OF PURE STRATEGY , 1971 .

[21]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[22]  R. Vohra,et al.  Calibrated Learning and Correlated Equilibrium , 1996 .

[23]  Tuomas Sandholm,et al.  Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games , 2018, AAAI.

[24]  Kevin Leyton-Brown,et al.  Polynomial-time computation of exact correlated equilibrium in compact games , 2010, EC '11.

[25]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[26]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[27]  J. Vial,et al.  Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon , 1978 .

[28]  Bernhard von Stengel,et al.  Computing an Extensive-Form Correlated Equilibrium in Polynomial Time , 2008, WINE.

[29]  Tuomas Sandholm,et al.  Ex ante coordination and collusion in zero-sum multi-player extensive-form games , 2018, NeurIPS.

[30]  Tuomas Sandholm,et al.  Coarse Correlation in Extensive-Form Games , 2019, AAAI.