On the Impossibility of Convergence of Mixed Strategies with No Regret Learning

We study the limiting behavior of the mixed strategies that result from optimal no-regret learning strategies in a repeated game setting where the stage game is any 2 by 2 competitive game. We consider optimal no-regret algorithms that are mean-based and monotonic in their argument. We show that for any such algorithm, the limiting mixed strategies of the players cannot converge almost surely to any Nash equilibrium. This negative result is also shown to hold under a broad relaxation of these assumptions, including popular variants of Online-Mirror-Descent with optimism and/or adaptive step-sizes. Finally, we conjecture that the monotonicity assumption can be removed, and provide partial evidence for this conjecture. Our results identify the inherent stochasticity in players' realizations as a critical factor underlying this divergence in outcomes between using the opponent's mixtures and realizations to make updates.

[1]  Xiao Wang,et al.  Last iterate convergence in no-regret learning: constrained min-max optimization for convex-concave landscapes , 2020, AISTATS.

[2]  Jacob Abernethy,et al.  Last-iterate convergence rates for min-max optimization , 2019, ArXiv.

[3]  Chuan-Sheng Foo,et al.  Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.

[4]  Constantinos Daskalakis,et al.  Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization , 2018, ITCS.

[5]  Georgios Piliouras,et al.  Multiplicative Weights Update in Zero-Sum Games , 2018, EC.

[6]  Sergiu Hart,et al.  Smooth calibration, leaky forecasts, finite recall, and Nash dynamics , 2018, Games Econ. Behav..

[7]  Tengyuan Liang,et al.  Interaction Matters: A Note on Non-asymptotic Local Convergence of Generative Adversarial Networks , 2018, AISTATS.

[8]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[9]  Venkat Anantharam,et al.  On the geometry of nash and correlated equilibria with cumulative prospect theoretic preferences , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[10]  Tim Roughgarden,et al.  Twenty Lectures on Algorithmic Game Theory , 2016, Bull. EATCS.

[11]  Éva Tardos,et al.  Learning in Games: Robustness of Fast Convergence , 2016, NIPS.

[12]  Wouter M. Koolen,et al.  MetaGrad: Multiple Learning Rates in Online Learning , 2016, NIPS.

[13]  Haipeng Luo,et al.  Fast Convergence of Regularized Learning in Games , 2015, NIPS.

[14]  Wouter M. Koolen,et al.  Second-order Quantile Methods for Experts and Combinatorial Games , 2015, COLT.

[15]  Francesco Orabona,et al.  Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning , 2014, NIPS.

[16]  Karthik Sridharan,et al.  Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[17]  Karthik Sridharan,et al.  Online Learning with Predictable Sequences , 2012, COLT.

[18]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[19]  Wouter M. Koolen,et al.  Adaptive Hedge , 2011, NIPS.

[20]  Yoav Freund,et al.  A Parameter-free Hedging Algorithm , 2009, NIPS.

[21]  Elad Hazan,et al.  Extracting certainty from uncertainty: regret bounded by variation in costs , 2008, Machine Learning.

[22]  R. Selten,et al.  Stationary Concepts for Experimental 2x2 Games , 2008 .

[23]  G. Lugosi,et al.  Prediction, learning, and games , 2006 .

[24]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[25]  Y. Mansour,et al.  Improved second-order bounds for prediction with expert advice , 2005, Machine Learning.

[26]  K. Binmore,et al.  Does Minimax Work? An Experimental Study , 2001 .

[27]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[28]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[29]  Dean P. Foster,et al.  Calibrated Learning and Correlated Equilibrium , 1997 .

[30]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[31]  P. Hall,et al.  Martingale Limit Theory and Its Application , 1980 .

[32]  Edward B. Roessler,et al.  Introduction to Probability and Statistics , 1961, The Mathematical Gazette.

[33]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[34]  J. Robinson AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[35]  Jeff S. Shamma,et al.  Learning in Games , 2015, Encyclopedia of Systems and Control.

[36]  Haipeng Luo,et al.  Achieving All with No Parameters: AdaNormalHedge , 2015, COLT.

[37]  S. Hart Adaptive Heuristics , 2005 .

[38]  Antoni Calvó-Armengol The Set of Correlated Equilibria of 2 × 2 Games ∗ , 2004 .

[39]  S. Vempala,et al.  Eecient Algorithms for Online Decision Problems , 2003 .

[40]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[41]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .

[42]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .