Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games

Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS ‘21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is O(polylog(T )) after T repetitions of the game. We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium at the rate of Õ ( T ) . This substantially improves over the prior best rate of convergence for correlated equilibria of O(T) due to Chen and Peng (NeurIPS ‘20), and it is optimal—within the no-regret framework—up to polylogarithmic factors in T . To obtain these results, we develop new techniques for establishing higher-order smoothness for learning dynamics involving fixed point operations. Specifically, we establish that the no-internal-regret learning dynamics of Stoltz and Lugosi (Mach Learn ‘05) are equivalently simulated by no-external-regret dynamics on a combinatorial space. This allows us to trade the computation of the stationary distribution on a polynomial-sized Markov chain for a (much more well-behaved) linear transformation on an exponential-sized set, enabling us to leverage similar techniques as DGF to near-optimally bound the internal regret. Moreover, we establish an O(polylog(T )) no-swap-regret bound for the classic algorithm of Blum and Mansour (BM) (JMLR ‘07). We do so by introducing a technique based on the Cauchy Integral Formula that circumvents the more limited combinatorial arguments of DFG. In addition to shedding clarity on the near-optimal regret guarantees of BM, our arguments provide insights into the various ways in which the techniques by DFG can be extended and leveraged in the analysis of more involved learning algorithms.

[1]  Karthik Sridharan,et al.  Online Learning with Predictable Sequences , 2012, COLT.

[2]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[3]  Xiaotie Deng,et al.  Settling the complexity of computing two-player Nash equilibria , 2007, JACM.

[4]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[5]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[6]  Paul W. Goldberg,et al.  The complexity of computing a Nash equilibrium , 2006, STOC '06.

[7]  Noah Golowich,et al.  Near-Optimal No-Regret Learning in General Games , 2021, ArXiv.

[8]  Haipeng Luo,et al.  Linear Last-iterate Convergence in Constrained Saddle-point Optimization , 2020, ICLR.

[9]  Geoffrey J. Gordon,et al.  No-regret learning in convex games , 2008, ICML '08.

[10]  Binghui Peng,et al.  Hedging in games: Faster convergence of external and swap regrets , 2020, NeurIPS.

[11]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[12]  A. Cayley A theorem on trees , 2009 .

[13]  Constantinos Daskalakis,et al.  Near-optimal no-regret algorithms for zero-sum games , 2011, SODA '11.

[14]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[15]  Kousha Etessami,et al.  On the Complexity of Nash Equilibria and Other Fixed Points (Extended Abstract) , 2010, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[16]  Haipeng Luo,et al.  Fast Convergence of Regularized Learning in Games , 2015, NIPS.

[17]  Gábor Lugosi,et al.  Learning correlated equilibria in games with compact sets of strategies , 2007, Games Econ. Behav..

[18]  Aviad Rubinstein,et al.  Settling the Complexity of Computing Approximate Two-Player Nash Equilibria , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[19]  Gábor Lugosi,et al.  Internal Regret in On-Line Portfolio Selection , 2005, Machine Learning.

[20]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[21]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[22]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[23]  Karthik Sridharan,et al.  Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[24]  J. Robinson AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[25]  Constantinos Daskalakis,et al.  The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.

[26]  Christos H. Papadimitriou,et al.  Computing correlated equilibria in multi-player games , 2005, STOC '05.

[27]  Amy Greenwald,et al.  A General Class of No-Regret Learning Algorithms and Game-Theoretic Equilibria , 2003, COLT.

[28]  Dean P. Foster,et al.  Calibrated Learning and Correlated Equilibrium , 1997 .

[29]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[30]  Constantinos Daskalakis,et al.  Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization , 2018, ITCS.

[31]  Tuomas Sandholm,et al.  Regret Circuits: Composability of Regret Minimizers , 2018, ICML.

[32]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[33]  Peter L. Bartlett,et al.  Blackwell Approachability and No-Regret Learning are Equivalent , 2010, COLT.

[34]  ALEX KRUCKMAN,et al.  AN ELEMENTARY PROOF OF THE MARKOV CHAIN TREE THEOREM , 2010 .

[35]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[36]  John C. Harsanyi,et al.  Общая теория выбора равновесия в играх / A General Theory of Equilibrium Selection in Games , 1989 .

[37]  V. Anantharam,et al.  A proof of the Markov chain tree theorem , 1989 .

[38]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.