Continuous-time Lower Bounds for Gradient-based Algorithms

This article derives lower bounds on the convergence rate of continuous-time gradient-based optimization algorithms. The algorithms are subjected to a time-normalization constraint that avoids a reparametrization of time in order to make the discussion of continuous-time convergence rates meaningful. We reduce the multi-dimensional problem to a single dimension, recover well-known lower bounds from the discrete-time setting, and provide insights into why these lower bounds occur. We further explicitly provide algorithms that achieve the proposed lower bounds, even when the function class under consideration includes certain non-convex functions.

[1]  Aarne H. Sipilä,et al.  A nonexistence theorem for explicit $A$-stable methods , 1974 .

[2]  Roy M. Howard,et al.  Linear System Theory , 1992 .

[3]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[4]  Yair Carmon,et al.  Lower bounds for finding stationary points II: first-order methods , 2017, Mathematical Programming.

[5]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[6]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[7]  Michael I. Jordan,et al.  Generalized Momentum-Based Methods: A Hamiltonian Perspective , 2019, SIAM J. Optim..

[8]  F. Krogh,et al.  Solving Ordinary Differential Equations , 2019, Programming for Computations - Python.

[9]  Michael I. Jordan,et al.  A Dynamical Systems Perspective on Nesterov Acceleration , 2019, ICML.

[10]  Michael I. Jordan,et al.  On Nonconvex Optimization for Machine Learning , 2019, J. ACM.

[11]  R. Bellman Stability theory of differential equations , 1953 .

[12]  Ohad Shamir,et al.  On Lower and Upper Bounds in Smooth and Strongly Convex Optimization , 2016, J. Mach. Learn. Res..

[13]  Michael I. Jordan,et al.  Stochastic Gradient Descent Escapes Saddle Points Efficiently , 2019, ArXiv.

[14]  W. Rugh Linear System Theory , 1992 .

[15]  Karline Soetaert,et al.  Solving Ordinary Differential Equations in R , 2012 .

[16]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[17]  T. Fujii,et al.  On positive real lemma for non-minimal realization systems , 2008 .

[18]  P. Hartman Ordinary Differential Equations , 1965 .

[19]  J. Butcher Numerical methods for ordinary differential equations , 2003 .

[20]  Randy A. Freeman,et al.  The Fastest Known Globally Convergent First-Order Method for Minimizing Strongly Convex Functions , 2018, IEEE Control Systems Letters.

[21]  C. Desoer,et al.  An elementary proof of Kharitonov's stability theorem with extensions , 1989 .

[22]  A. Fuller,et al.  Stability of Motion , 1976, IEEE Transactions on Systems, Man, and Cybernetics.