Decentralized No-regret Learning Algorithms for Extensive-form Correlated Equilibria (Extended Abstract)

The existence of uncoupled no-regret learning dynamics converging to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form games generalize normal-form games by modeling both sequential and simultaneous moves, as well as imperfect information. Because of the sequential nature and the presence of private information, correlation in extensive-form games possesses significantly different properties than in normal-form games. The extensive-form correlated equilibrium (EFCE) is the natural extensive-form counterpart to the classical notion of correlated equilibrium in normal-form games. Compared to the latter, the constraints that define the set of EFCEs are significantly more complex, as the correlation device (a.k.a. mediator) must take into account the evolution of beliefs of each player as they make observations throughout the game. Due to this additional complexity, the existence of uncoupled learning dynamics leading to an EFCE has remained a challenging open research question for a long time. In this article, we settle that question by giving the first uncoupled noregret dynamics which provably converge to the set of EFCEs in n-player general-sum extensive-form games with perfect recall. We show that each iterate can be computed in time polynomial in the size of the game tree, and that, when all players play repeatedly according to our learning dynamics, the empirical frequency of play after T game repetitions is guaranteed to be a O(1/ √ T )-approximate EFCE with high probability, and an EFCE almost surely in the limit. ∗The complete version of this paper won a best paper award at NeurIPS 2020 [Celli et al., 2020]. Some of the results presented here only appear in the full version of the paper [Farina et al., 2021]. †Contact Author. 1 Motivation This work studies decision-making problems in which rational individuals interact with a centralized planner. The centralized planner cannot directly tell the individuals what to do, but the goal of the centralized planner is to steer the individuals’ behaviors to mutually beneficial outcomes. There are many real-world problems where we observe this type of interaction, and this is increasingly common in the gig economy we all live in today. Think, for example, of ride-sharing or food delivery platforms, where drivers provide services to customers, and the whole market is centralized through a single app that every agent connects to. Because the individual decision makers in the system have free will, the central planner has to take into consideration the fact that all of the individual decision makers will act selfishly according to their objectives. Therefore, to get them to behave in a certain way, the central planner must nudge them using the right incentives. This type of soft coordination is already enough to steer the system to social welfare that would be largely impossible in absence of a central planner, so without any form of coordination between the decision makers. The strategy that the central planner should follow when interacting with decision makers is called a correlated equilibrium in the game theory literature. The key feature of a correlated equilibrium is that all the decision makers receive the right incentives to follow the planner’s recommendations. This means that no agent would want to do something different from what they are recommended to do by the central planner. The study of this type of equilibrium goes back to the seminal work on correlated equilibrium by Robert Aumann in 1974 [Aumann, 1974], who was later awarded a Nobel prize in economics for his work on game-theoretic cooperation. Since then, there has been much effort in scaling up the computation of correlated equilibria and designing algorithms guaranteeing some key properties. In particular, a crucial property that algorithms for computing correlated equilibria should satisfy is decentralization. This essentially means that the behavior and incentives of each agent should be computed independently from the other agents. The decentralization is fundamental for the computation to scale well, and it allows the agents to converge to an equilibrium point without the need for a central planner. Furthermore, decentralization preserves the agents’ privacy during the learning process. Indeed, agents should not need to report their Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21) Sister Conferences Best Papers Track

[1]  Elad Hazan,et al.  Computational Equivalence of Fixed Points and No Regret Algorithms, and Convergence to Equilibria , 2007, NIPS.

[2]  Peter Secretan Learning , 1965, Mental Health.

[3]  Bernhard von Stengel,et al.  Extensive-Form Correlated Equilibrium: Definition and Computational Complexity , 2008, Math. Oper. Res..

[4]  Noam Nisan,et al.  Proceedings of the 4th ACM conference on Electronic commerce , 2003 .

[5]  Nicola Gatti,et al.  Learning to Correlate in Multi-Player General-Sum Sequential Games , 2019, NeurIPS.

[6]  F. A. Hayek The American Economic Review , 2007 .

[7]  Miroslav Dudík,et al.  A Sampling-Based Approach to Computing Equilibria in Succinct Extensive-Form Games , 2009, UAI.

[8]  Xiaotie Deng,et al.  Settling the Complexity of Two-Player Nash Equilibrium , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[9]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[10]  Andreu Mas-Colell,et al.  A General Class of Adaptive Strategies , 1999, J. Econ. Theory.

[11]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[12]  Bernhard von Stengel,et al.  Computing an Extensive-Form Correlated Equilibrium in Polynomial Time , 2008, WINE.

[13]  Dock Bumpers,et al.  Volume 2 , 2005, Proceedings of the Ninth International Conference on Computer Supported Cooperative Work in Design, 2005..

[14]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[15]  D. Fudenberg,et al.  Consistency and Cautious Fictitious Play , 1995 .

[16]  Nicola Gatti,et al.  Simple Uncoupled No-regret Learning Dynamics for Extensive-form Correlated Equilibrium , 2020, J. ACM.

[17]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Amy Greenwald,et al.  A General Class of No-Regret Learning Algorithms and Game-Theoretic Equilibria , 2003, COLT.

[19]  Tuomas Sandholm,et al.  Solving Imperfect-Information Games via Discounted Regret Minimization , 2018, AAAI.

[20]  Amotz Cahn,et al.  General procedures leading to correlated equilibria , 2004, Int. J. Game Theory.

[21]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[22]  Oskari Tammelin,et al.  Solving Large Imperfect Information Games Using CFR+ , 2014, ArXiv.

[23]  Dean P. Foster,et al.  Calibrated Learning and Correlated Equilibrium , 1997 .

[24]  Nicola Gatti,et al.  Computational Results for Extensive-Form Adversarial Team Games , 2017, AAAI.

[25]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[26]  Gábor Lugosi,et al.  Learning correlated equilibria in games with compact sets of strategies , 2007, Games Econ. Behav..

[27]  J. Hofbauer,et al.  Uncoupled Dynamics Do Not Lead to Nash Equilibrium , 2003 .

[28]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[29]  Oriol Carbonell-Nicolau Games and Economic Behavior , 2011 .

[30]  Michael H. Bowling,et al.  Solving Heads-Up Limit Texas Hold'em , 2015, IJCAI.