论文信息 - Doubly Optimal No-Regret Learning in Monotone Games - 字舞流文

Doubly Optimal No-Regret Learning in Monotone Games

We consider online learning in multi-player smooth monotone games. Existing algorithms have limitations such as (1) being only applicable to strongly monotone games; (2) lacking the no-regret guarantee; (3) having only asymptotic or slow $O(\frac{1}{\sqrt{T}})$ last-iterate convergence rate to a Nash equilibrium. While the $O(\frac{1}{\sqrt{T}})$ rate is tight for a large class of algorithms including the well-studied extragradient algorithm and optimistic gradient algorithm, it is not optimal for all gradient-based algorithms. We propose the accelerated optimistic gradient (AOG) algorithm, the first doubly optimal no-regret learning algorithm for smooth monotone games. Namely, our algorithm achieves both (i) the optimal $O(\sqrt{T})$ regret in the adversarial setting under smooth and convex loss functions and (ii) the optimal $O(\frac{1}{T})$ last-iterate convergence rate to a Nash equilibrium in multi-player smooth monotone games. As a byproduct of the accelerated last-iterate convergence rate, we further show that each player suffers only an $O(\log T)$ individual worst-case dynamic regret, providing an exponential improvement over the previous state-of-the-art $O(\sqrt{T})$ bound.

Weiqiang Zheng | Yang Cai

[1] Ioannis Panageas,et al. On the Convergence of No-Regret Learning Dynamics in Time-Varying Games , 2023, ArXiv.

[2] Yang Cai,et al. Accelerated Single-Call Methods for Constrained Min-Max Optimization , 2022, ICLR.

[3] Sarah H. Cen,et al. Mastering the game of Stratego with model-free multiagent reinforcement learning , 2022, Science.

[4] V. Cevher,et al. No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation , 2022, NeurIPS.

[5] Yang Cai,et al. Accelerated Algorithms for Monotone Inclusions and Constrained Nonconvex-Nonconcave Min-Max Optimization , 2022, ArXiv.

[6] Haipeng Luo,et al. Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games , 2022, NeurIPS.

[7] Q. Tran-Dinh,et al. The Connection Between Nesterov's Accelerated Methods and Halpern Fixed-Point Iterations , 2022, 2203.04869.

[8] Tianyi Lin,et al. Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback , 2021, 2112.02856.

[9] C. Daskalakis,et al. Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games , 2021, STOC.

[10] C. Daskalakis,et al. Near-Optimal No-Regret Learning in General Games , 2021, NeurIPS.

[11] Sucheol Lee,et al. Fast Extra Gradient Methods for Smooth Structured Nonconvex-Nonconcave Minimax Problems , 2021, NeurIPS.

[12] Kimon Antonakopoulos,et al. Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium , 2021, COLT.

[13] TaeHo Yoon,et al. Accelerated Algorithms for Smooth Convex-Concave Minimax Problems with O(1/k^2) Rate on Squared Gradient Norm , 2021, ICML.

[14] Noah Golowich,et al. Tight last-iterate convergence rates for no-regret learning in multi-player games , 2020, NeurIPS.

[15] Haipeng Luo,et al. Linear Last-iterate Convergence in Constrained Saddle-point Optimization , 2020, ICLR.

[16] Xi Chen,et al. Hedging in games: Faster convergence of external and swap regrets , 2020, NeurIPS.

[17] Michael I. Jordan,et al. Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games , 2020, ICML.

[18] Jelena Diakonikolas. Halpern Iteration for Near-Optimal and Parameter-Free Monotone Inclusion and Strong Solutions to Variational Inequalities , 2020, COLT.

[19] Xiao Wang,et al. Last iterate convergence in no-regret learning: constrained min-max optimization for convex-concave landscapes , 2020, AISTATS.

[20] Noah Golowich,et al. Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems , 2020, COLT.

[21] J. Malick,et al. On the convergence of single-call stochastic extra-gradient methods , 2019, NeurIPS.

[22] Aryan Mokhtari,et al. A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[23] Peter W. Glynn,et al. Learning in Games with Lossy Feedback , 2018, NeurIPS.

[24] Yangyang Xu,et al. Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems , 2018, Math. Program..

[25] Constantinos Daskalakis,et al. The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.

[26] Constantinos Daskalakis,et al. Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization , 2018, ITCS.

[27] Tengyuan Liang,et al. Interaction Matters: A Note on Non-asymptotic Local Convergence of Generative Adversarial Networks , 2018, AISTATS.

[28] Volkan Cevher,et al. Let's be honest: An optimal no-regret framework for zero-sum games , 2018, ICML.

[29] Peter W. Glynn,et al. Mirror descent learning in continuous games , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[30] Peter W. Glynn,et al. Countering Feedback Delays in Multi-Agent Learning , 2017, NIPS.

[31] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[32] Constantinos Daskalakis,et al. Training GANs with Optimism , 2017, ICLR.

[33] Christos H. Papadimitriou,et al. Cycles in adversarial regularized learning , 2017, SODA.

[34] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.

[35] Stephen P. Boyd,et al. On the Convergence of Mirror Descent beyond Stochastic Convex Programming , 2017, SIAM J. Optim..

[36] Zhengyuan Zhou,et al. Learning in games with continuous action sets and unknown payoff functions , 2016, Mathematical Programming.

[37] Yang Cai,et al. Zero-Sum Polymatrix Games: A Generalization of Minmax , 2016, Math. Oper. Res..

[38] Haipeng Luo,et al. Fast Convergence of Regularized Learning in Games , 2015, NIPS.

[39] Karthik Sridharan,et al. Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[40] Andriy Zapechelnyuk,et al. No-regret dynamics and fictitious play , 2012, J. Econ. Theory.

[41] Constantinos Daskalakis,et al. Near-optimal no-regret algorithms for zero-sum games , 2011, SODA '11.

[42] Yang Cai,et al. On minmax theorems for multiplayer games , 2011, SODA '11.

[43] Christos H. Papadimitriou,et al. On a Network Generalization of the Minmax Theorem , 2009, ICALP.

[44] Yishay Mansour,et al. On the convergence of regret minimization dynamics in concave games , 2009, STOC '09.

[45] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[46] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[47] S. Sorin. A First Course on Zero Sum Repeated Games , 2002 .

[48] P. Tseng. On linear convergence of iterative methods for the variational inequality problem , 1995 .

[49] L. Popov. A modification of the Arrow-Hurwicz method for search of saddle points , 1980 .

[50] B. Halpern. Fixed points of nonexpanding maps , 1967 .

[51] Michael I. Jordan,et al. Adaptive, Doubly Optimal No-Regret Learning in Games with Gradient Feedback , 2022, Social Science Research Network.

[52] Yang Cai,et al. Finite-Time Last-Iterate Convergence for Learning in Multi-Player Games , 2022, NeurIPS.

[53] D. M. V. Hesteren,et al. Evolutionary Game Theory , 2021, Encyclopedia of Evolutionary Psychological Science.

[54] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[55] G. M. Korpelevich. The extragradient method for finding saddle points and other problems , 1976 .