论文信息 - Simple Uncoupled No-regret Learning Dynamics for Extensive-form Correlated Equilibrium - 字舞流文

Simple Uncoupled No-regret Learning Dynamics for Extensive-form Correlated Equilibrium

The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation possesses significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium, though it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this article, we give the first uncoupled no-regret dynamics that converge with high probability to the set of EFCEs in =-player general-sum extensive-form games with perfect recall. First, we introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games. When each player has low trigger regret, the empirical frequency of play is close to an EFCE. Then, we give an efficient no-regret algorithm which guarantees with high probability that trigger regrets grow sublinearly in the number of iterations.

Nicola Gatti | Gabriele Farina | Alberto Marchesi | Andrea Celli | N. Gatti | A. Celli | Gabriele Farina | A. Marchesi

[1] Miroslav Dudík,et al. A Sampling-Based Approach to Computing Equilibria in Succinct Extensive-Form Games , 2009, UAI.

[2] Andreu Mas-Colell,et al. A General Class of Adaptive Strategies , 1999, J. Econ. Theory.

[3] D. Koller,et al. Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[4] Michael Bowling,et al. Hindsight and Sequential Rationality of Correlated Play , 2021, AAAI.

[5] Tim Roughgarden,et al. How bad is selfish routing? , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[6] Tuomas Sandholm,et al. Polynomial-Time Computation of Optimal Correlated Equilibria in Two-Player Extensive-Form Games with Public Chance Moves and Beyond , 2020, NeurIPS.

[7] D. Fudenberg,et al. Conditional Universal Consistency , 1999 .

[8] Gábor Lugosi,et al. Internal Regret in On-Line Portfolio Selection , 2005, Machine Learning.

[9] D. Fudenberg,et al. Consistency and Cautious Fictitious Play , 1995 .

[10] Tuomas Sandholm,et al. Ex ante coordination and collusion in zero-sum multi-player extensive-form games , 2018, NeurIPS.

[11] Yishay Mansour,et al. From External to Internal Regret , 2005, J. Mach. Learn. Res..

[12] Bernhard von Stengel,et al. Extensive-Form Correlated Equilibrium: Definition and Computational Complexity , 2008, Math. Oper. Res..

[13] Tuomas Sandholm,et al. Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks , 2019, NeurIPS.

[14] J. Vial,et al. Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon , 1978 .

[15] Christos H. Papadimitriou,et al. Computing correlated equilibria in multi-player games , 2005, STOC '05.

[16] Tuomas Sandholm,et al. Efficient Regret Minimization Algorithm for Extensive-Form Correlated Equilibrium , 2019, NeurIPS.

[17] Michael Bowling,et al. Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games , 2021, ICML.

[18] Amy Greenwald,et al. A General Class of No-Regret Learning Algorithms and Game-Theoretic Equilibria , 2003, COLT.

[19] Xiaotie Deng,et al. Settling the Complexity of Two-Player Nash Equilibrium , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[20] Jacob D. Abernethy,et al. Beating the adaptive bandit with high probability , 2009, 2009 Information Theory and Applications Workshop.

[21] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .

[22] Bernhard von Stengel,et al. Computing an Extensive-Form Correlated Equilibrium in Polynomial Time , 2008, WINE.

[23] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[24] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[25] Nicola Gatti,et al. Learning to Correlate in Multi-Player General-Sum Sequential Games , 2019, NeurIPS.

[26] Tuomas Sandholm,et al. Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games , 2018, AAAI.

[27] Tuomas Sandholm,et al. Solving Imperfect-Information Games via Discounted Regret Minimization , 2018, AAAI.

[28] Amotz Cahn,et al. General procedures leading to correlated equilibria , 2004, Int. J. Game Theory.

[29] John Langford,et al. Correlated equilibria in graphical games , 2003, EC '03.

[30] Dean P. Foster,et al. Calibrated Learning and Correlated Equilibrium , 1997 .

[31] Stefano Coniglio,et al. Computing Optimal Ex Ante Correlated Equilibria in Two-Player Sequential Games , 2019, AAMAS.

[32] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[33] Michael H. Bowling,et al. Solving Heads-Up Limit Texas Hold'em , 2015, IJCAI.

[34] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[35] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[36] Kevin Waugh,et al. Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[37] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[38] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[39] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[40] Tuomas Sandholm,et al. Regret Circuits: Composability of Regret Minimizers , 2018, ICML.

[41] Thomas P. Hayes,et al. High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.

[42] Paul W. Goldberg,et al. The complexity of computing a Nash equilibrium , 2006, STOC '06.

[43] C. Paige,et al. Computation of the stationary distribution of a markov chain , 1975 .

[44] Kevin Leyton-Brown,et al. Polynomial-time computation of exact correlated equilibrium in compact games , 2010, EC '11.

[45] B. Stengel,et al. Efficient Computation of Behavior Strategies , 1996 .

[46] Gábor Lugosi,et al. Learning correlated equilibria in games with compact sets of strategies , 2007, Games Econ. Behav..

[47] R. Aumann. Subjectivity and Correlation in Randomized Strategies , 1974 .

[48] Yoav Shoham,et al. Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[49] Geoffrey J. Gordon,et al. No-regret learning in convex games , 2008, ICML '08.

[50] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[51] Christos H. Papadimitriou,et al. Worst-case Equilibria , 1999, STACS.

[52] S. Ross. GOOFSPIEL -- THE GAME OF PURE STRATEGY , 1971 .

[53] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[54] Oskari Tammelin,et al. Solving Large Imperfect Information Games Using CFR+ , 2014, ArXiv.

[55] H. W. Kuhn,et al. 11. Extensive Games and the Problem of Information , 1953 .