Last-Iterate Convergence of Saddle Point Optimizers via High-Resolution Differential Equations

Several widely-used first-order saddle point optimization methods yield an identical continuous-time ordinary differential equation (ODE) to that of the Gradient Descent Ascent (GDA) method when derived naively. However, their convergence properties are very different even on simple bilinear games. We use a technique from fluid dynamics called High-Resolution Differential Equations (HRDEs) to design ODEs of several saddle point optimization methods. On bilinear games, the convergence properties of the derived HRDEs correspond to that of the starting discrete methods. Using these techniques, we show that the HRDE of Optimistic Gradient Descent Ascent (OGDA) has last-iterate convergence for general monotone variational inequalities. To our knowledge, this is the first continuous-time dynamics shown to converge for such a general setting. Moreover, we provide the rates for the best-iterate convergence of the OGDA method, relying solely on the first-order smoothness of the monotone operator.

[1]  Michael I. Jordan,et al.  Understanding the acceleration phenomenon via high-resolution differential equations , 2018, Mathematical Programming.

[2]  Eyad H. Abed,et al.  Guardian maps and the generalized stability of parametrized families of matrices and polynomials , 1990, Math. Control. Signals Syst..

[3]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[4]  Ohad Shamir,et al.  On the Iteration Complexity of Oblivious First-Order Optimization Algorithms , 2016, ICML.

[5]  Jacob Abernethy,et al.  Last-iterate convergence rates for min-max optimization , 2019, ArXiv.

[6]  Renato D. C. Monteiro,et al.  On the Complexity of the Hybrid Proximal Extragradient Method for the Iterates and the Ergodic Mean , 2010, SIAM J. Optim..

[7]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[8]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[9]  Lillian J. Ratliff,et al.  Local Convergence Analysis of Gradient Descent Ascent with Finite Timescale Separation , 2021, ICLR.

[10]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[11]  Volkan Cevher,et al.  The limits of min-max optimization algorithms: convergence to spurious non-critical sets , 2020, ArXiv.

[12]  Tatjana Chavdarova,et al.  Reducing Noise in GAN Training with Variance Reduced Extragradient , 2019, NeurIPS.

[13]  Geoffrey E. Hinton,et al.  Lookahead Optimizer: k steps forward, 1 step back , 2019, NeurIPS.

[14]  H. Attouch,et al.  A second-order differential system with hessian-driven damping; application to non-elastic shock laws , 2012 .

[15]  Ioannis Mitliagkas,et al.  Stochastic Hamiltonian Gradient Methods for Smooth Games , 2020, ICML.

[16]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[17]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[18]  Panayotis Mertikopoulos,et al.  On the convergence of single-call stochastic extra-gradient methods , 2019, NeurIPS.

[19]  G. Stampacchia,et al.  On the regularity of the solution of a variational inequality , 1969 .

[20]  Pascal Vincent,et al.  A Closer Look at the Optimization Landscapes of Generative Adversarial Networks , 2019, ICLR.

[21]  Constantinos Daskalakis,et al.  The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.

[22]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[23]  Anna Nagurney,et al.  Variational Inequalities , 2009, Encyclopedia of Optimization.

[24]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[25]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[26]  L. Popov A modification of the Arrow-Hurwicz method for search of saddle points , 1980 .

[27]  Guodong Zhang,et al.  On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach , 2019, ICLR.

[28]  A. Sard,et al.  The measure of the critical values of differentiable maps , 1942 .

[29]  Michael I. Jordan,et al.  On dissipative symplectic integration with applications to gradient-based optimization , 2020 .

[30]  Thomas Hofmann,et al.  Local Saddle Point Optimization: A Curvature Exploitation Approach , 2018, AISTATS.

[31]  P. Tseng On linear convergence of iterative methods for the variational inequality problem , 1995 .

[32]  Christos H. Papadimitriou,et al.  Cycles in adversarial regularized learning , 2017, SODA.

[33]  Xu-kai Xie,et al.  Stable polynomials with complex coefficients , 1985, 1985 24th IEEE Conference on Decision and Control.

[34]  Chuan-Sheng Foo,et al.  Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.

[35]  Michael I. Jordan,et al.  Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization , 2020, AISTATS.

[36]  Ioannis Mitliagkas,et al.  LEAD: Least-Action Dynamics for Min-Max Optimization , 2020, ArXiv.

[37]  Ioannis Mitliagkas,et al.  Negative Momentum for Improved Game Dynamics , 2018, AISTATS.

[38]  Guodong Zhang,et al.  A Unified Analysis of First-Order Methods for Smooth Games via Integral Quadratic Constraints , 2020, J. Mach. Learn. Res..

[39]  S. Shankar Sastry,et al.  On Gradient-Based Learning in Continuous Games , 2018, SIAM J. Math. Data Sci..

[40]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .