A Lyapunov Analysis of Momentum Methods in Optimization

Momentum methods play a significant role in optimization. Examples include Nesterov's accelerated gradient method and the conditional gradient algorithm. Several momentum methods are provably optimal under standard oracle models, and all use a technique called estimate sequences to analyze their convergence properties. The technique of estimate sequences has long been considered difficult to understand, leading many researchers to generate alternative, "more intuitive" methods and analyses. We show there is an equivalence between the technique of estimate sequences and a family of Lyapunov functions in both continuous and discrete time. This connection allows us to develop a simple and unified analysis of many existing momentum algorithms, introduce several new algorithms, and strengthen the connection between algorithms and continuous-time dynamical systems.

[1]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[2]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[3]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[4]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[5]  A. M. Lyapunov The general problem of the stability of motion , 1992 .

[6]  Marc Teboulle,et al.  Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions , 1993, SIAM J. Optim..

[7]  J. Butcher Numerical methods for ordinary differential equations in the 20th century , 2000 .

[8]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[9]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[10]  Yurii Nesterov,et al.  Accelerating the cubic regularization of Newton’s method on convex problems , 2005, Math. Program..

[11]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[12]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[13]  M. Baes Estimate sequence methods: extensions and approximations , 2009 .

[14]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[15]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[16]  Marc Teboulle,et al.  Performance of first-order methods for smooth convex minimization: a novel approach , 2012, Mathematical Programming.

[17]  Emmanuel J. Candès,et al.  Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.

[18]  Mohit Singh,et al.  A geometric alternative to Nesterov's accelerated gradient descent , 2015, ArXiv.

[19]  Yu. Nesterov,et al.  Quasi-monotone Subgradient Methods for Nonsmooth Convex Minimization , 2015, J. Optim. Theory Appl..

[20]  Francis R. Bach,et al.  Duality Between Subgradient and Conditional Gradient Methods , 2012, SIAM J. Optim..

[21]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[22]  Yurii Nesterov,et al.  Universal gradient methods for convex optimization problems , 2015, Math. Program..

[23]  Ashia C. Wilson,et al.  On Accelerated Methods in Optimization , 2015, 1509.03616.

[24]  Paul Grigas,et al.  New analysis and results for the Frank–Wolfe method , 2013, Mathematical Programming.

[25]  S. Vavasis,et al.  A unified convergence bound for conjugate gradient and accelerated gradient , 2016, 1605.00320.

[26]  Benjamin Recht,et al.  Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[27]  Donghwan Kim,et al.  Optimized first-order methods for smooth convex minimization , 2014, Mathematical Programming.

[28]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[29]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[30]  Zeyuan Allen Zhu,et al.  Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling , 2015, ICML.

[31]  Yurii Nesterov,et al.  Efficiency of the Accelerated Coordinate Descent Method on Structured Optimization Problems , 2017, SIAM J. Optim..

[32]  Adrien B. Taylor,et al.  Smooth strongly convex interpolation and exact worst-case performance of first-order methods , 2015, Mathematical Programming.

[33]  Michael I. Jordan,et al.  Breaking Locality Accelerates Block Gauss-Seidel , 2017, ICML.

[34]  Yurii Nesterov,et al.  Complexity bounds for primal-dual methods minimizing the model of objective function , 2017, Mathematical Programming.

[35]  Dmitriy Drusvyatskiy,et al.  An Optimal First Order Method Based on Optimal Quadratic Averaging , 2016, SIAM J. Optim..