论文信息 - Potential Function-based Framework for Making the Gradients Small in Convex and Min-Max Optimization - 字舞流文

Potential Function-based Framework for Making the Gradients Small in Convex and Min-Max Optimization

Making the gradients small is a fundamental optimization problem that has eluded unifying and simple convergence arguments in first-order optimization, so far primarily reserved for other convergence criteria, such as reducing the optimality gap. We introduce a novel potential function-based framework to study the convergence of standard methods for making the gradients small in smooth convex optimization and convex-concave min-max optimization. Our framework is intuitive and it provides a lens for viewing algorithms that make the gradients small as being driven by a trade-off between reducing either the gradient norm or a certain notion of an optimality gap. On the lower bounds side, we discuss tightness of the obtained convergence results for the convex setup and provide a new lower bound for minimizing norm of cocoercive operators that allows us to argue about optimality of methods in the min-max setup.

Jelena Diakonikolas | Puqian Wang | Jelena Diakonikolas | Puqian Wang

[1] S. Łojasiewicz. Ensembles semi-analytiques , 1965 .

[2] Noah Golowich,et al. Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems , 2020, COLT.

[3] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[4] Yair Carmon,et al. Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[5] Ohad Shamir,et al. On the Iteration Complexity of Oblivious First-Order Optimization Algorithms , 2016, ICML.

[6] Dmitriy Drusvyatskiy,et al. An Optimal First Order Method Based on Optimal Quadratic Averaging , 2016, SIAM J. Optim..

[7] Yurii Nesterov,et al. Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[8] Andre Wibisono,et al. A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[9] Yurii Nesterov,et al. Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[10] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[11] Yangyang Xu,et al. Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems , 2018, Math. Program..

[12] Arkadi Nemirovski,et al. Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[13] Etienne de Klerk,et al. Worst-Case Convergence Analysis of Inexact Gradient and Newton Methods Through Semidefinite Programming Performance Estimation , 2020, SIAM J. Optim..

[14] G. M. Korpelevich. The extragradient method for finding saddle points and other problems , 1976 .

[15] Jeffrey A. Fessler,et al. Generalizing the Optimized Gradient Method for Smooth Convex Minimization , 2016, SIAM J. Optim..

[16] Ohad Shamir,et al. On Lower and Upper Bounds in Smooth and Strongly Convex Optimization , 2016, J. Mach. Learn. Res..

[17] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[18] Donghwan Kim. Accelerated Proximal Point Method and Forward Method for Monotone Inclusions , 2019, 1905.05149.

[19] Bin Hu,et al. Control interpretations for first-order optimization methods , 2017, 2017 American Control Conference (ACC).

[20] Jelena Diakonikolas,et al. Complementary Composite Minimization, Small Gradients in General Norms, and Applications to Regression Problems , 2021 .

[21] Benar Fux Svaiter,et al. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..

[22] Zhengyuan Zhou,et al. Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities , 2021, NeurIPS.

[23] Aryan Mokhtari,et al. Direct Runge-Kutta Discretization Achieves Acceleration , 2018, NeurIPS.

[24] Shimrit Shtern,et al. A First Order Method for Solving Convex Bilevel Optimization Problems , 2017, SIAM J. Optim..

[25] Jelena Diakonikolas. Halpern Iteration for Near-Optimal and Parameter-Free Monotone Inclusion and Strong Solutions to Variational Inequalities , 2020, COLT.

[26] H. Attouch,et al. THE HEAVY BALL WITH FRICTION METHOD, I. THE CONTINUOUS DYNAMICAL SYSTEM: GLOBAL EXPLORATION OF THE LOCAL MINIMA OF A REAL-VALUED FUNCTION BY ASYMPTOTIC ANALYSIS OF A DISSIPATIVE DYNAMICAL SYSTEM , 2000 .

[27] Mohit Singh,et al. A geometric alternative to Nesterov's accelerated gradient descent , 2015, ArXiv.

[28] Stephen P. Boyd,et al. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[29] Marc Teboulle,et al. Performance of first-order methods for smooth convex minimization: a novel approach , 2012, Mathematical Programming.

[30] L. Popov. A modification of the Arrow-Hurwicz method for search of saddle points , 1980 .

[31] Zeyuan Allen Zhu,et al. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[32] Michael I. Jordan,et al. A Lyapunov Analysis of Momentum Methods in Optimization , 2016, ArXiv.

[33] Jelena Diakonikolas,et al. Accelerated Extra-Gradient Descent: A Novel Accelerated First-Order Method , 2017, ITCS.

[34] Michael I. Jordan,et al. On Symplectic Optimization , 2018, 1802.03653.

[35] Lin Xiao,et al. An adaptive accelerated proximal gradient method and its homotopy continuation for sparse optimization , 2014, Computational Optimization and Applications.

[36] Alexandre M. Bayen,et al. Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[37] Alexander Gasnikov,et al. Primal–dual accelerated gradient methods with small-dimensional relaxation oracle , 2018, Optim. Methods Softw..

[38] B. Halpern. Fixed points of nonexpanding maps , 1967 .

[39] Chaobing Song,et al. Unified Acceleration of High-Order Algorithms under General Hölder Continuity , 2021, SIAM J. Optim..

[40] H. Attouch,et al. Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3 , 2017, ESAIM: Control, Optimisation and Calculus of Variations.

[41] Jelena Diakonikolas,et al. The Approximate Duality Gap Technique: A Unified Theory of First-Order Methods , 2017, SIAM J. Optim..

[42] Zaki Chbani,et al. First-order optimization algorithms via inertial systems with Hessian driven damping , 2019, Mathematical Programming.

[43] Adrien B. Taylor,et al. Exact Worst-Case Performance of First-Order Methods for Composite Convex Optimization , 2015, SIAM J. Optim..

[44] Jeffrey A. Fessler,et al. Optimizing the Efficiency of First-Order Methods for Decreasing the Gradient of Smooth Convex Functions , 2018, Journal of Optimization Theory and Applications.

[45] W. R. Mann,et al. Mean value methods in iteration , 1953 .

[46] Mituhiro Fukuda,et al. Nearly Optimal First-Order Methods for Convex Optimization under Gradient Norm Measure: an Adaptive Regularization Approach , 2019, Journal of Optimization Theory and Applications.

[47] Felix Lieder,et al. On the convergence rate of the Halpern-iteration , 2020, Optim. Lett..

[48] Benjamin Recht,et al. Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[49] J. Bolte,et al. Characterizations of Lojasiewicz inequalities: Subgradient flows, talweg, convexity , 2009 .

[50] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[51] Michael I. Jordan,et al. Understanding the acceleration phenomenon via high-resolution differential equations , 2018, Mathematical Programming.

[52] C. Zălinescu. Convex analysis in general vector spaces , 2002 .

[53] F. Bach,et al. Integration Methods and Accelerated Optimization Algorithms , 2017, 1702.06751.

[54] Michael I. Jordan,et al. Generalized Momentum-Based Methods: A Hamiltonian Perspective , 2019, SIAM J. Optim..

[55] Zeyuan Allen-Zhu,et al. How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD , 2018, NeurIPS.