Learning to Correlate in Multi-Player General-Sum Sequential Games

In the context of multi-player, general-sum games, there is a growing interest in solution concepts involving some form of communication among players, since they can lead to socially better outcomes with respect to Nash equilibria and may be reached through learning dynamics in a decentralized fashion. In this paper, we focus on coarse correlated equilibria (CCEs) in sequential games. First, we complete the picture on the complexity of finding social-welfare-maximizing CCEs by proving that the problem is not in Poly-APX, unless P = NP, in games with three or more players (including chance). Then, we provide simple arguments showing that CFR---working with behavioral strategies---may not converge to a CCE in multi-player, general-sum sequential games. In order to amend this issue, we devise two variants of CFR that provably converge to a CCE. The first one (CFR-S) is a simple stochastic adaptation of CFR which employs sampling to build a correlated strategy, whereas the second variant (called CFR-Jr) enhances CFR with a more involved reconstruction procedure to recover correlated strategies from behavioral ones. Experiments on a rich testbed of multi-player, general-sum sequential games show that both CFR-S and CFR-Jr are dramatically faster than the state-of-the-art algorithms to compute CCEs, with CFR-Jr being also a good heuristic to find socially-optimal CCEs.

[1]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[2]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[3]  Tuomas Sandholm,et al.  Solving Imperfect-Information Games via Discounted Regret Minimization , 2018, AAAI.

[4]  Kevin Waugh,et al.  Abstraction in Large Extensive Games , 2009 .

[5]  Tuomas Sandholm,et al.  Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.

[6]  Bernhard von Stengel,et al.  Extensive-Form Correlated Equilibrium: Definition and Computational Complexity , 2008, Math. Oper. Res..

[7]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[8]  J. Vial,et al.  Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon , 1978 .

[9]  Éva Tardos,et al.  No-Regret Learning in Bayesian Games , 2015, NIPS.

[10]  Bernhard von Stengel,et al.  Computing an Extensive-Form Correlated Equilibrium in Polynomial Time , 2008, WINE.

[11]  Y. Mansour,et al.  Algorithmic Game Theory: Learning, Regret Minimization, and Equilibria , 2007 .

[12]  Oskari Tammelin,et al.  Solving Large Imperfect Information Games Using CFR+ , 2014, ArXiv.

[13]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[14]  Richard Gibson,et al.  Regret Minimization in Non-Zero-Sum Games with Applications to Building Champion Multiplayer Computer Poker Agents , 2013, ArXiv.

[15]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[16]  Martin Schmid,et al.  Revisiting CFR+ and Alternating Updates , 2018, J. Artif. Intell. Res..

[17]  Tuomas Sandholm,et al.  Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games , 2018, AAAI.

[18]  Kevin Leyton-Brown,et al.  Polynomial-time computation of exact correlated equilibrium in compact games , 2010, EC '11.

[19]  H. Kuhn 9. A SIMPLIFIED TWO-PERSON POKER , 1951 .

[20]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[21]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[22]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[23]  Katrina Ligett,et al.  Finding any nontrivial coarse correlated equilibrium is hard , 2015, SECO.

[24]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[25]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[26]  Tuomas Sandholm,et al.  Deep Counterfactual Regret Minimization , 2018, ICML.

[27]  Y. Mansour,et al.  4 Learning , Regret minimization , and Equilibria , 2006 .

[28]  Tuomas Sandholm,et al.  Ex ante coordination and collusion in zero-sum multi-player extensive-form games , 2018, NeurIPS.

[29]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[30]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[31]  J. Nash NON-COOPERATIVE GAMES , 1951, Classics in Game Theory.

[32]  Stefano Coniglio,et al.  Computing Optimal Coarse Correlated Equilibria in Sequential Games , 2019, ArXiv.

[33]  Gunes Ercal,et al.  On No-Regret Learning, Fictitious Play, and Nash Equilibrium , 2001, ICML.

[34]  Duane Szafron,et al.  Using counterfactual regret minimization to create competitive multiplayer poker agents , 2010, AAMAS 2010.

[35]  R. Vohra,et al.  Calibrated Learning and Correlated Equilibrium , 1996 .

[36]  R. Myerson MULTISTAGE GAMES WITH COMMUNICATION , 1984 .

[37]  S. Vajda Some topics in two-person games , 1971 .

[38]  Neil Burch,et al.  Heads-up limit hold’em poker is solved , 2015, Science.

[39]  Stefano Coniglio,et al.  Computing Optimal Ex Ante Correlated Equilibria in Two-Player Sequential Games , 2019, AAMAS.

[40]  S. Ross GOOFSPIEL -- THE GAME OF PURE STRATEGY , 1971 .

[41]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[42]  Martin Grötschel,et al.  The ellipsoid method and its consequences in combinatorial optimization , 1981, Comb..

[43]  Tim Roughgarden,et al.  Intrinsic Robustness of the Price of Anarchy , 2015, J. ACM.

[44]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[45]  F. Forges,et al.  Five legitimate definitions of correlated equilibrium in games with incomplete information , 1993 .

[46]  Michael H. Bowling,et al.  Solving Heads-Up Limit Texas Hold'em , 2015, IJCAI.