Acceleration in First Order Quasi-strongly Convex Optimization by ODE Discretization

We study gradient-based optimization methods obtained by direct Runge-Kutta discretization of the ordinary differential equation (ODE) describing the movement of a heavy-ball under constant friction coefficient. When the function is high-order smooth and strongly convex, we show that directly simulating the ODE with known numerical integrators achieve acceleration in a nontrivial neighborhood of the optimal solution. In particular, the neighborhood may grow larger as the condition number of the function increases. Furthermore, our results also hold for nonconvex but quasi-strongly convex objectives. We provide numerical experiments that verify the theoretical rates predicted by our results.

[1]  Pascal Bianchi,et al.  Convergence of the ADAM algorithm from a Dynamical System Viewpoint , 2018, ArXiv.

[2]  Pascal Bianchi,et al.  Convergence and Dynamical Behavior of the ADAM Algorithm for Nonconvex Stochastic Optimization , 2018, SIAM J. Optim..

[3]  Mohit Singh,et al.  A geometric alternative to Nesterov's accelerated gradient descent , 2015, ArXiv.

[4]  Michael I. Jordan,et al.  Acceleration via Symplectic Discretization of High-Resolution Differential Equations , 2019, NeurIPS.

[5]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[6]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[7]  Bin Hu,et al.  Dissipativity Theory for Nesterov's Accelerated Method , 2017, ICML.

[8]  Felipe Alvarez,et al.  On the Minimizing Property of a Second Order Dissipative System in Hilbert Spaces , 2000, SIAM J. Control. Optim..

[9]  Ronald E. Bruck Asymptotic convergence of nonlinear contraction semigroups in Hilbert space , 1975 .

[10]  Michael I. Jordan,et al.  Understanding the acceleration phenomenon via high-resolution differential equations , 2018, Mathematical Programming.

[11]  Alejandro Ribeiro,et al.  Analysis of Optimization Algorithms via Integral Quadratic Constraints: Nonstrongly Convex Problems , 2017, SIAM J. Optim..

[12]  Alexandre d'Aspremont,et al.  Regularized nonlinear acceleration , 2016, Mathematical Programming.

[13]  Aryan Mokhtari,et al.  Achieving Acceleration in Distributed Optimization via Direct Discretization of the Heavy-Ball ODE , 2018, 2019 American Control Conference (ACC).

[14]  E. Hairer,et al.  Geometric Numerical Integration: Structure Preserving Algorithms for Ordinary Differential Equations , 2004 .

[15]  Zeyuan Allen Zhu,et al.  Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[16]  J. H. Verner,et al.  High-order explicit Runge-Kutta pairs with low stage order , 1996 .

[17]  H. Attouch,et al.  THE HEAVY BALL WITH FRICTION METHOD, I. THE CONTINUOUS DYNAMICAL SYSTEM: GLOBAL EXPLORATION OF THE LOCAL MINIMA OF A REAL-VALUED FUNCTION BY ASYMPTOTIC ANALYSIS OF A DISSIPATIVE DYNAMICAL SYSTEM , 2000 .

[18]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[19]  H. Attouch,et al.  A Dynamical Approach to Convex Minimization Coupling Approximation with the Steepest Descent Method , 1996 .

[20]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[21]  Quanquan Gu,et al.  Accelerated Stochastic Mirror Descent: From Continuous-time Dynamics to Discrete-time Algorithms , 2018, AISTATS.

[22]  Benjamin Recht,et al.  Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[23]  Yee Whye Teh,et al.  Hamiltonian Descent Methods , 2018, ArXiv.

[24]  Yurii Nesterov,et al.  Linear convergence of first order methods for non-strongly convex optimization , 2015, Math. Program..

[25]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[26]  Aryan Mokhtari,et al.  Direct Runge-Kutta Discretization Achieves Acceleration , 2018, NeurIPS.

[27]  Daniel P. Robinson,et al.  ADMM and Accelerated ADMM as Continuous Dynamical Systems , 2018, ICML.

[28]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[29]  Michael I. Jordan,et al.  On Symplectic Optimization , 2018, 1802.03653.

[30]  Ashia C. Wilson,et al.  Accelerating Rescaled Gradient Descent , 2019, 1902.08825.

[31]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..