A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights

We derive a second-order ordinary differential equation (ODE), which is the limit of Nesterov's accelerated gradient method. This ODE exhibits approximate equivalence to Nesterov's scheme and thus can serve as a tool for analysis. We show that the continuous time ODE allows for a better understanding of Nesterov's scheme. As a byproduct, we obtain a family of schemes with similar convergence rates. The ODE interpretation also suggests restarting Nesterov's scheme leading to an algorithm, which can be rigorously proven to converge at a linear rate whenever the objective is strongly convex.

[1]  L. Milne‐Thomson A Treatise on the Theory of Bessel Functions , 1945, Nature.

[2]  F. H. Branin Widely convergent method for finding multiple solutions of simultaneous nonlinear equations , 1972 .

[3]  David A. Wismer,et al.  Introduction to nonlinear optimization , 1978 .

[4]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[5]  M. Bartholomew-Biggs,et al.  Some effective methods for unconstrained optimization based on the solution of systems of ordinary differential equations , 1989 .

[6]  A. Bloch Hamiltonian and Gradient Flows, Algorithms and Control , 1995 .

[7]  U. Helmke,et al.  Optimization and Dynamical Systems , 1994, Proceedings of the IEEE.

[8]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[9]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[10]  J. Schropp,et al.  A dynamical systems approach to constrained minimization , 2000 .

[11]  J. Leader Numerical Analysis and Scientific Computation , 2022 .

[12]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[13]  D. Hinton Sturm’s 1836 Oscillation Results Evolution of the Theory , 2005 .

[14]  Raphael Hauser,et al.  The Continuous Newton--Raphson Method Can Look Ahead , 2005, SIAM J. Optim..

[15]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[16]  Simone G. O. Fiori,et al.  Quasi-Geodesic Neural Learning Algorithms Over the Orthogonal Group: A Tutorial , 2005, J. Mach. Learn. Res..

[17]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[18]  A. Ruszczynski,et al.  Nonlinear Optimization , 2006 .

[19]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[20]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[21]  Paul Tseng,et al.  Approximation accuracy, gradient methods, and error bound for structured convex optimization , 2010, Math. Program..

[22]  Emmanuel J. Candès,et al.  Templates for convex cone problems with applications to sparse signal recovery , 2010, Math. Program. Comput..

[23]  Emmanuel J. Candès,et al.  NESTA: A Fast and Accurate First-Order Method for Sparse Recovery , 2009, SIAM J. Imaging Sci..

[24]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[25]  Hans-Bernd Dürr,et al.  A smooth vector field for quadratic programming , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[26]  Donald Goldfarb,et al.  2 A Variable-Splitting Augmented Lagrangian Framework , 2011 .

[27]  C. Ebenbauer,et al.  On a Class of Smooth Optimization Algorithms with Applications in Control , 2012 .

[28]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[29]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[30]  Ying Xiong Nonlinear Optimization , 2014 .

[31]  S. Osher,et al.  Sparse Recovery via Differential Inclusions , 2014, 1406.7728.

[32]  Amir Beck,et al.  Introduction to Nonlinear Optimization - Theory, Algorithms, and Applications with MATLAB , 2014, MOS-SIAM Series on Optimization.

[33]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[34]  Yi-gui Ou,et al.  A nonmonotone ODE-based method for unconstrained optimization , 2014, Int. J. Comput. Math..

[35]  Weijie J. Su,et al.  SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION. , 2014, The annals of applied statistics.

[36]  Emmanuel J. Candès,et al.  Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.

[37]  Benjamin Recht,et al.  Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[38]  Renato D. C. Monteiro,et al.  An adaptive accelerated first-order method for convex optimization , 2016, Comput. Optim. Appl..

[39]  Franziska Wulf,et al.  Minimization Methods For Non Differentiable Functions , 2016 .