Generalized Momentum-Based Methods: A Hamiltonian Perspective

We take a Hamiltonian-based perspective to generalize Nesterov's accelerated gradient descent and Polyak's heavy ball method to a broad class of momentum methods in the setting of (possibly) constrained minimization in Euclidean and non-Euclidean normed vector spaces. Our perspective leads to a generic and unifying nonasymptotic analysis of convergence of these methods in both the function value (in the setting of convex optimization) and in norm of the gradient (in the setting of unconstrained, possibly nonconvex, optimization). Our approach relies upon a time-varying Hamiltonian that produces generalized momentum methods as its equations of motion. The convergence analysis for these methods is intuitive and is based on the conserved quantities of the time-dependent Hamiltonian.

[1]  Uri M. Ascher,et al.  Discrete processes and their continuous limits , 2019, Journal of Dynamics & Games.

[2]  Michael I. Jordan,et al.  Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.

[3]  Peter Seiler,et al.  Direct Synthesis of Iterative Algorithms With Bounds on Achievable Worst-Case Convergence Rate , 2019, 2020 American Control Conference (ACC).

[4]  Michael I. Jordan,et al.  A Dynamical Systems Perspective on Nesterov Acceleration , 2019, ICML.

[5]  Michael I. Jordan,et al.  On Symplectic Optimization , 2018, 1802.03653.

[6]  Laura Waller,et al.  Accelerated Wirtinger Flow for Multiplexed Fourier Ptychographic Microscopy , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[7]  A. Chambolle,et al.  On the Convergence of the Iterates of the “Fast Iterative Shrinkage/Thresholding Algorithm” , 2015, J. Optim. Theory Appl..

[8]  A. V. Gasnikov,et al.  Universal Method for Stochastic Composite Optimization Problems , 2018 .

[9]  Aryan Mokhtari,et al.  Direct Runge-Kutta Discretization Achieves Acceleration , 2018, NeurIPS.

[10]  Yair Carmon,et al.  Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[11]  J. Bolte,et al.  On damped second-order gradient systems , 2014, 1411.8005.

[12]  Stephen J. Wright,et al.  Behavior of accelerated gradient methods near critical points of nonconvex functions , 2017, Math. Program..

[13]  H. Attouch,et al.  THE HEAVY BALL WITH FRICTION METHOD, I. THE CONTINUOUS DYNAMICAL SYSTEM: GLOBAL EXPLORATION OF THE LOCAL MINIMA OF A REAL-VALUED FUNCTION BY ASYMPTOTIC ANALYSIS OF A DISSIPATIVE DYNAMICAL SYSTEM , 2000 .

[14]  S. Gadat,et al.  On the long time behavior of second order differential equations with asymptotically small dissipation , 2007, 0710.1107.

[15]  Justin P. Haldar,et al.  Accelerated Wirtinger Flow: A fast algorithm for ptychography , 2018, 1806.05546.

[16]  Jelena Diakonikolas,et al.  The Approximate Duality Gap Technique: A Unified Theory of First-Order Methods , 2017, SIAM J. Optim..

[17]  Zaki Chbani,et al.  First-order optimization algorithms via inertial systems with Hessian driven damping , 2019, Mathematical Programming.

[18]  Bin Hu,et al.  Control interpretations for first-order optimization methods , 2017, 2017 American Control Conference (ACC).

[19]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[20]  Rida Laraki,et al.  Inertial Game Dynamics and Applications to Constrained Optimization , 2013, SIAM J. Control. Optim..

[21]  Mohit Singh,et al.  A geometric alternative to Nesterov's accelerated gradient descent , 2015, ArXiv.

[22]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[23]  Michael I. Jordan,et al.  A Lyapunov Analysis of Momentum Methods in Optimization , 2016, ArXiv.

[24]  F. Bach,et al.  Integration Methods and Accelerated Optimization Algorithms , 2017, 1702.06751.

[25]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Math. Program..

[26]  A. Nemirovskii,et al.  Optimal methods of smooth convex minimization , 1986 .

[27]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[28]  Zeyuan Allen-Zhu,et al.  Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter , 2017, ICML.

[29]  Zeyuan Allen-Zhu,et al.  How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD , 2018, NeurIPS.

[30]  Zeyuan Allen Zhu,et al.  Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[31]  Jelena Diakonikolas,et al.  Accelerated Extra-Gradient Descent: A Novel Accelerated First-Order Method , 2017, ITCS.

[32]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[33]  Jelena Diakonikolas,et al.  Lower Bounds for Parallel and Randomized Convex Optimization , 2018, COLT.

[34]  Euhanna Ghadimi,et al.  Global convergence of the Heavy-ball method for convex optimization , 2014, 2015 European Control Conference (ECC).

[35]  Edouard Pauwels,et al.  An Inertial Newton Algorithm for Deep Learning , 2019, J. Mach. Learn. Res..

[36]  A. V. Gasnikov,et al.  Primal-dual accelerated gradient descent with line search for convex and nonconvex optimization problems , 2019, Доклады Академии наук.

[37]  Jeffrey A. Fessler,et al.  Optimizing the Efficiency of First-Order Methods for Decreasing the Gradient of Smooth Convex Functions , 2018, J. Optim. Theory Appl..

[38]  Dmitriy Drusvyatskiy,et al.  An Optimal First Order Method Based on Optimal Quadratic Averaging , 2016, SIAM J. Optim..

[39]  H. Attouch,et al.  Convergence Rate of Proximal Inertial Algorithms Associated with Moreau Envelopes of Convex Functions , 2019, Splitting Algorithms, Modern Operator Theory, and Applications.

[40]  Saeed Ghadimi,et al.  Generalized Uniformly Optimal Methods for Nonlinear Programming , 2015, Journal of Scientific Computing.

[41]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[42]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[43]  Michael I. Jordan,et al.  Understanding the acceleration phenomenon via high-resolution differential equations , 2018, Mathematical Programming.

[44]  Jeffrey A. Fessler,et al.  Generalizing the Optimized Gradient Method for Smooth Convex Minimization , 2016, SIAM J. Optim..

[45]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[46]  Alexander Gasnikov,et al.  Primal–dual accelerated gradient methods with small-dimensional relaxation oracle , 2018, Optim. Methods Softw..

[47]  Michael Cohen,et al.  On Acceleration with Noise-Corrupted Gradients , 2018, ICML.

[48]  Satish Rao,et al.  A new approach to computing maximum flows using electrical flows , 2013, STOC '13.

[49]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[50]  Osman Güler,et al.  New Proximal Point Algorithms for Convex Minimization , 1992, SIAM J. Optim..

[51]  D. Bertsekas Control of uncertain systems with a set-membership description of the uncertainty , 1971 .

[52]  Daniel P. Robinson,et al.  Conformal symplectic and relativistic optimization , 2019, NeurIPS.

[53]  H. Attouch,et al.  Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3 , 2017, ESAIM: Control, Optimisation and Calculus of Variations.