Potential Function-based Framework for Making the Gradients Small in Convex and Min-Max Optimization

Making the gradients small is a fundamental optimization problem that has eluded unifying and simple convergence arguments in first-order optimization, so far primarily reserved for other convergence criteria, such as reducing the optimality gap. We introduce a novel potential function-based framework to study the convergence of standard methods for making the gradients small in smooth convex optimization and convex-concave min-max optimization. Our framework is intuitive and it provides a lens for viewing algorithms that make the gradients small as being driven by a trade-off between reducing either the gradient norm or a certain notion of an optimality gap. On the lower bounds side, we discuss tightness of the obtained convergence results for the convex setup and provide a new lower bound for minimizing norm of cocoercive operators that allows us to argue about optimality of methods in the min-max setup.

[1]  S. Łojasiewicz Ensembles semi-analytiques , 1965 .

[2]  Noah Golowich,et al.  Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems , 2020, COLT.

[3]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[4]  Yair Carmon,et al.  Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[5]  Ohad Shamir,et al.  On the Iteration Complexity of Oblivious First-Order Optimization Algorithms , 2016, ICML.

[6]  Dmitriy Drusvyatskiy,et al.  An Optimal First Order Method Based on Optimal Quadratic Averaging , 2016, SIAM J. Optim..

[7]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[8]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[9]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[10]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[11]  Yangyang Xu,et al.  Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems , 2018, Math. Program..

[12]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[13]  Etienne de Klerk,et al.  Worst-Case Convergence Analysis of Inexact Gradient and Newton Methods Through Semidefinite Programming Performance Estimation , 2020, SIAM J. Optim..

[14]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .

[15]  Jeffrey A. Fessler,et al.  Generalizing the Optimized Gradient Method for Smooth Convex Minimization , 2016, SIAM J. Optim..

[16]  Ohad Shamir,et al.  On Lower and Upper Bounds in Smooth and Strongly Convex Optimization , 2016, J. Mach. Learn. Res..

[17]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[18]  Donghwan Kim Accelerated Proximal Point Method and Forward Method for Monotone Inclusions , 2019, 1905.05149.

[19]  Bin Hu,et al.  Control interpretations for first-order optimization methods , 2017, 2017 American Control Conference (ACC).

[20]  Jelena Diakonikolas,et al.  Complementary Composite Minimization, Small Gradients in General Norms, and Applications to Regression Problems , 2021 .

[21]  Benar Fux Svaiter,et al.  Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..

[22]  Zhengyuan Zhou,et al.  Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities , 2021, NeurIPS.

[23]  Aryan Mokhtari,et al.  Direct Runge-Kutta Discretization Achieves Acceleration , 2018, NeurIPS.

[24]  Shimrit Shtern,et al.  A First Order Method for Solving Convex Bilevel Optimization Problems , 2017, SIAM J. Optim..

[25]  Jelena Diakonikolas Halpern Iteration for Near-Optimal and Parameter-Free Monotone Inclusion and Strong Solutions to Variational Inequalities , 2020, COLT.

[26]  H. Attouch,et al.  THE HEAVY BALL WITH FRICTION METHOD, I. THE CONTINUOUS DYNAMICAL SYSTEM: GLOBAL EXPLORATION OF THE LOCAL MINIMA OF A REAL-VALUED FUNCTION BY ASYMPTOTIC ANALYSIS OF A DISSIPATIVE DYNAMICAL SYSTEM , 2000 .

[27]  Mohit Singh,et al.  A geometric alternative to Nesterov's accelerated gradient descent , 2015, ArXiv.

[28]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[29]  Marc Teboulle,et al.  Performance of first-order methods for smooth convex minimization: a novel approach , 2012, Mathematical Programming.

[30]  L. Popov A modification of the Arrow-Hurwicz method for search of saddle points , 1980 .

[31]  Zeyuan Allen Zhu,et al.  Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[32]  Michael I. Jordan,et al.  A Lyapunov Analysis of Momentum Methods in Optimization , 2016, ArXiv.

[33]  Jelena Diakonikolas,et al.  Accelerated Extra-Gradient Descent: A Novel Accelerated First-Order Method , 2017, ITCS.

[34]  Michael I. Jordan,et al.  On Symplectic Optimization , 2018, 1802.03653.

[35]  Lin Xiao,et al.  An adaptive accelerated proximal gradient method and its homotopy continuation for sparse optimization , 2014, Computational Optimization and Applications.

[36]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[37]  Alexander Gasnikov,et al.  Primal–dual accelerated gradient methods with small-dimensional relaxation oracle , 2018, Optim. Methods Softw..

[38]  B. Halpern Fixed points of nonexpanding maps , 1967 .

[39]  Chaobing Song,et al.  Unified Acceleration of High-Order Algorithms under General Hölder Continuity , 2021, SIAM J. Optim..

[40]  H. Attouch,et al.  Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3 , 2017, ESAIM: Control, Optimisation and Calculus of Variations.

[41]  Jelena Diakonikolas,et al.  The Approximate Duality Gap Technique: A Unified Theory of First-Order Methods , 2017, SIAM J. Optim..

[42]  Zaki Chbani,et al.  First-order optimization algorithms via inertial systems with Hessian driven damping , 2019, Mathematical Programming.

[43]  Adrien B. Taylor,et al.  Exact Worst-Case Performance of First-Order Methods for Composite Convex Optimization , 2015, SIAM J. Optim..

[44]  Jeffrey A. Fessler,et al.  Optimizing the Efficiency of First-Order Methods for Decreasing the Gradient of Smooth Convex Functions , 2018, Journal of Optimization Theory and Applications.

[45]  W. R. Mann,et al.  Mean value methods in iteration , 1953 .

[46]  Mituhiro Fukuda,et al.  Nearly Optimal First-Order Methods for Convex Optimization under Gradient Norm Measure: an Adaptive Regularization Approach , 2019, Journal of Optimization Theory and Applications.

[47]  Felix Lieder,et al.  On the convergence rate of the Halpern-iteration , 2020, Optim. Lett..

[48]  Benjamin Recht,et al.  Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[49]  J. Bolte,et al.  Characterizations of Lojasiewicz inequalities: Subgradient flows, talweg, convexity , 2009 .

[50]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[51]  Michael I. Jordan,et al.  Understanding the acceleration phenomenon via high-resolution differential equations , 2018, Mathematical Programming.

[52]  C. Zălinescu Convex analysis in general vector spaces , 2002 .

[53]  F. Bach,et al.  Integration Methods and Accelerated Optimization Algorithms , 2017, 1702.06751.

[54]  Michael I. Jordan,et al.  Generalized Momentum-Based Methods: A Hamiltonian Perspective , 2019, SIAM J. Optim..

[55]  Zeyuan Allen-Zhu,et al.  How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD , 2018, NeurIPS.