Last-Iterate Convergence of Saddle Point Optimizers via High-Resolution Differential Equations

Several widely-used first-order saddle-point optimization methods yield an identical continuous-time ordinary differential equation (ODE) that is identical to that of the Gradient Descent Ascent (GDA) method when derived naively. However, the convergence properties of these methods are qualitatively different, even on simple bilinear games. Thus the ODE perspective, which has proved powerful in analyzing single-objective optimization methods, has not played a similar role in saddle-point optimization. We adopt a framework studied in fluid dynamics -- known as High-Resolution Differential Equations (HRDEs) -- to design differential equation models for several saddle-point optimization methods. Critically, these HRDEs are distinct for various saddle-point optimization methods. Moreover, in bilinear games, the convergence properties of the HRDEs match the qualitative features of the corresponding discrete methods. Additionally, we show that the HRDE of Optimistic Gradient Descent Ascent (OGDA) exhibits \emph{last-iterate convergence} for general monotone variational inequalities. Finally, we provide rates of convergence for the \emph{best-iterate convergence} of the OGDA method, relying solely on the first-order smoothness of the monotone operator.

[1]  Michael I. Jordan,et al.  Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization , 2020, AISTATS.

[2]  Ioannis Mitliagkas,et al.  LEAD: Least-Action Dynamics for Min-Max Optimization , 2020, ArXiv.

[3]  Noah Golowich,et al.  Tight last-iterate convergence rates for no-regret learning in multi-player games , 2020, NeurIPS.

[4]  Laurent Lessard,et al.  A Unified Analysis of First-Order Methods for Smooth Games via Integral Quadratic Constraints , 2020, J. Mach. Learn. Res..

[5]  Ioannis Mitliagkas,et al.  Stochastic Hamiltonian Gradient Methods for Smooth Games , 2020, ICML.

[6]  Ya-Ping Hsieh,et al.  The limits of min-max optimization algorithms: convergence to spurious non-critical sets , 2020, ICML.

[7]  Michael I. Jordan,et al.  On dissipative symplectic integration with applications to gradient-based optimization , 2020, Journal of Statistical Mechanics: Theory and Experiment.

[8]  J. Malick,et al.  Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling , 2020, NeurIPS.

[9]  Noah Golowich,et al.  Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems , 2020, COLT.

[10]  Haihao Lu,et al.  An O(sr)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(s^r)$$\end{document}-resolution ODE framework for understand , 2020, Mathematical Programming.

[11]  Jimmy Ba,et al.  On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach , 2019, ICLR.

[12]  J. Malick,et al.  On the convergence of single-call stochastic extra-gradient methods , 2019, NeurIPS.

[13]  Geoffrey E. Hinton,et al.  Lookahead Optimizer: k steps forward, 1 step back , 2019, NeurIPS.

[14]  Pascal Vincent,et al.  A Closer Look at the Optimization Landscapes of Generative Adversarial Networks , 2019, ICLR.

[15]  Jacob Abernethy,et al.  Last-iterate convergence rates for min-max optimization , 2019, ArXiv.

[16]  Tatjana Chavdarova,et al.  Reducing Noise in GAN Training with Variance Reduced Extragradient , 2019, NeurIPS.

[17]  Aryan Mokhtari,et al.  A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[18]  Michael I. Jordan,et al.  Understanding the acceleration phenomenon via high-resolution differential equations , 2018, Mathematical Programming.

[19]  Ioannis Mitliagkas,et al.  Negative Momentum for Improved Game Dynamics , 2018, AISTATS.

[20]  Constantinos Daskalakis,et al.  The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.

[21]  Chuan-Sheng Foo,et al.  Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.

[22]  Thomas Hofmann,et al.  Local Saddle Point Optimization: A Curvature Exploitation Approach , 2018, AISTATS.

[23]  S. Shankar Sastry,et al.  On Gradient-Based Learning in Continuous Games , 2018, SIAM J. Math. Data Sci..

[24]  Tengyuan Liang,et al.  Interaction Matters: A Note on Non-asymptotic Local Convergence of Generative Adversarial Networks , 2018, AISTATS.

[25]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[26]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[27]  Christos H. Papadimitriou,et al.  Cycles in adversarial regularized learning , 2017, SODA.

[28]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[29]  Ohad Shamir,et al.  On the Iteration Complexity of Oblivious First-Order Optimization Algorithms , 2016, ICML.

[30]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[31]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[32]  Aaron C. Courville,et al.  Generative Adversarial Networks , 2014, 1406.2661.

[33]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[34]  H. Attouch,et al.  A second-order differential system with hessian-driven damping; application to non-elastic shock laws , 2012 .

[35]  Renato D. C. Monteiro,et al.  On the Complexity of the Hybrid Proximal Extragradient Method for the Iterates and the Ergodic Mean , 2010, SIAM J. Optim..

[36]  L. Debnath Geophysical Fluid Dynamics , 2008 .

[37]  J. Schropp,et al.  A dynamical systems approach to constrained minimization , 2000 .

[38]  M. Hirsch,et al.  Dynamics of Morse-Smale urn processes , 1995, Ergodic Theory and Dynamical Systems.

[39]  P. Tseng On linear convergence of iterative methods for the variational inequality problem , 1995 .

[40]  U. Helmke,et al.  Optimization and Dynamical Systems , 1994, Proceedings of the IEEE.

[41]  Eyad H. Abed,et al.  Guardian maps and the generalized stability of parametrized families of matrices and polynomials , 1990, Math. Control. Signals Syst..

[42]  Xu-kai Xie,et al.  Stable polynomials with complex coefficients , 1985, 1985 24th IEEE Conference on Decision and Control.

[43]  L. Popov A modification of the Arrow-Hurwicz method for search of saddle points , 1980 .

[44]  G. Stampacchia,et al.  On the regularity of the solution of a variational inequality , 1969 .

[45]  L. Hurwicz,et al.  Gradient Methods for Constrained Maxima , 1957 .

[46]  A. Sard,et al.  The measure of the critical values of differentiable maps , 1942 .

[47]  Lillian J. Ratliff,et al.  Local Convergence Analysis of Gradient Descent Ascent with Finite Timescale Separation , 2021, ICLR.

[48]  Ioannis Mitliagkas,et al.  A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Differentiable Games , 2020, AISTATS.

[49]  Anna Nagurney,et al.  Variational Inequalities , 2009, Encyclopedia of Optimization.

[50]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[51]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[52]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[53]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .

[54]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[55]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .