Doubly Optimal No-Regret Learning in Monotone Games

We consider online learning in multi-player smooth monotone games. Existing algorithms have limitations such as (1) being only applicable to strongly monotone games; (2) lacking the no-regret guarantee; (3) having only asymptotic or slow $O(\frac{1}{\sqrt{T}})$ last-iterate convergence rate to a Nash equilibrium. While the $O(\frac{1}{\sqrt{T}})$ rate is tight for a large class of algorithms including the well-studied extragradient algorithm and optimistic gradient algorithm, it is not optimal for all gradient-based algorithms. We propose the accelerated optimistic gradient (AOG) algorithm, the first doubly optimal no-regret learning algorithm for smooth monotone games. Namely, our algorithm achieves both (i) the optimal $O(\sqrt{T})$ regret in the adversarial setting under smooth and convex loss functions and (ii) the optimal $O(\frac{1}{T})$ last-iterate convergence rate to a Nash equilibrium in multi-player smooth monotone games. As a byproduct of the accelerated last-iterate convergence rate, we further show that each player suffers only an $O(\log T)$ individual worst-case dynamic regret, providing an exponential improvement over the previous state-of-the-art $O(\sqrt{T})$ bound.

[1]  Ioannis Panageas,et al.  On the Convergence of No-Regret Learning Dynamics in Time-Varying Games , 2023, ArXiv.

[2]  Yang Cai,et al.  Accelerated Single-Call Methods for Constrained Min-Max Optimization , 2022, ICLR.

[3]  Sarah H. Cen,et al.  Mastering the game of Stratego with model-free multiagent reinforcement learning , 2022, Science.

[4]  V. Cevher,et al.  No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation , 2022, NeurIPS.

[5]  Yang Cai,et al.  Accelerated Algorithms for Monotone Inclusions and Constrained Nonconvex-Nonconcave Min-Max Optimization , 2022, ArXiv.

[6]  Haipeng Luo,et al.  Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games , 2022, NeurIPS.

[7]  Q. Tran-Dinh,et al.  The Connection Between Nesterov's Accelerated Methods and Halpern Fixed-Point Iterations , 2022, 2203.04869.

[8]  Tianyi Lin,et al.  Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback , 2021, 2112.02856.

[9]  C. Daskalakis,et al.  Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games , 2021, STOC.

[10]  C. Daskalakis,et al.  Near-Optimal No-Regret Learning in General Games , 2021, NeurIPS.

[11]  Sucheol Lee,et al.  Fast Extra Gradient Methods for Smooth Structured Nonconvex-Nonconcave Minimax Problems , 2021, NeurIPS.

[12]  Kimon Antonakopoulos,et al.  Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium , 2021, COLT.

[13]  TaeHo Yoon,et al.  Accelerated Algorithms for Smooth Convex-Concave Minimax Problems with O(1/k^2) Rate on Squared Gradient Norm , 2021, ICML.

[14]  Noah Golowich,et al.  Tight last-iterate convergence rates for no-regret learning in multi-player games , 2020, NeurIPS.

[15]  Haipeng Luo,et al.  Linear Last-iterate Convergence in Constrained Saddle-point Optimization , 2020, ICLR.

[16]  Xi Chen,et al.  Hedging in games: Faster convergence of external and swap regrets , 2020, NeurIPS.

[17]  Michael I. Jordan,et al.  Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games , 2020, ICML.

[18]  Jelena Diakonikolas Halpern Iteration for Near-Optimal and Parameter-Free Monotone Inclusion and Strong Solutions to Variational Inequalities , 2020, COLT.

[19]  Xiao Wang,et al.  Last iterate convergence in no-regret learning: constrained min-max optimization for convex-concave landscapes , 2020, AISTATS.

[20]  Noah Golowich,et al.  Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems , 2020, COLT.

[21]  J. Malick,et al.  On the convergence of single-call stochastic extra-gradient methods , 2019, NeurIPS.

[22]  Aryan Mokhtari,et al.  A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[23]  Peter W. Glynn,et al.  Learning in Games with Lossy Feedback , 2018, NeurIPS.

[24]  Yangyang Xu,et al.  Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems , 2018, Math. Program..

[25]  Constantinos Daskalakis,et al.  The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.

[26]  Constantinos Daskalakis,et al.  Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization , 2018, ITCS.

[27]  Tengyuan Liang,et al.  Interaction Matters: A Note on Non-asymptotic Local Convergence of Generative Adversarial Networks , 2018, AISTATS.

[28]  Volkan Cevher,et al.  Let's be honest: An optimal no-regret framework for zero-sum games , 2018, ICML.

[29]  Peter W. Glynn,et al.  Mirror descent learning in continuous games , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[30]  Peter W. Glynn,et al.  Countering Feedback Delays in Multi-Agent Learning , 2017, NIPS.

[31]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[32]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[33]  Christos H. Papadimitriou,et al.  Cycles in adversarial regularized learning , 2017, SODA.

[34]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[35]  Stephen P. Boyd,et al.  On the Convergence of Mirror Descent beyond Stochastic Convex Programming , 2017, SIAM J. Optim..

[36]  Zhengyuan Zhou,et al.  Learning in games with continuous action sets and unknown payoff functions , 2016, Mathematical Programming.

[37]  Yang Cai,et al.  Zero-Sum Polymatrix Games: A Generalization of Minmax , 2016, Math. Oper. Res..

[38]  Haipeng Luo,et al.  Fast Convergence of Regularized Learning in Games , 2015, NIPS.

[39]  Karthik Sridharan,et al.  Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[40]  Andriy Zapechelnyuk,et al.  No-regret dynamics and fictitious play , 2012, J. Econ. Theory.

[41]  Constantinos Daskalakis,et al.  Near-optimal no-regret algorithms for zero-sum games , 2011, SODA '11.

[42]  Yang Cai,et al.  On minmax theorems for multiplayer games , 2011, SODA '11.

[43]  Christos H. Papadimitriou,et al.  On a Network Generalization of the Minmax Theorem , 2009, ICALP.

[44]  Yishay Mansour,et al.  On the convergence of regret minimization dynamics in concave games , 2009, STOC '09.

[45]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[46]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[47]  S. Sorin A First Course on Zero Sum Repeated Games , 2002 .

[48]  P. Tseng On linear convergence of iterative methods for the variational inequality problem , 1995 .

[49]  L. Popov A modification of the Arrow-Hurwicz method for search of saddle points , 1980 .

[50]  B. Halpern Fixed points of nonexpanding maps , 1967 .

[51]  Michael I. Jordan,et al.  Adaptive, Doubly Optimal No-Regret Learning in Games with Gradient Feedback , 2022, Social Science Research Network.

[52]  Yang Cai,et al.  Finite-Time Last-Iterate Convergence for Learning in Multi-Player Games , 2022, NeurIPS.

[53]  D. M. V. Hesteren,et al.  Evolutionary Game Theory , 2021, Encyclopedia of Evolutionary Psychological Science.

[54]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[55]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .