Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step

We consider the minimization of non-convex quadratic forms regularized by a cubic term, which exhibit multiple saddle points and poor local minima. Nonetheless, we prove that, under mild assumptions, gradient descent approximates the $\textit{global minimum}$ to within $\varepsilon$ accuracy in $O(\varepsilon^{-1}\log(1/\varepsilon))$ steps for large $\varepsilon$ and $O(\log(1/\varepsilon))$ steps for small $\varepsilon$ (compared to a condition number we define), with at most logarithmic dependence on the problem dimension. When we use gradient descent to approximate the Nesterov-Polyak cubic-regularized Newton step, our result implies a rate of convergence to second-order stationary points of general smooth non-convex functions.

[1]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[2]  Nicholas I. M. Gould,et al.  On solving trust-region and other regularised subproblems in optimization , 2010, Math. Program. Comput..

[3]  Philip E. Gill,et al.  A Subspace Minimization Method for the Trust-Region Step , 2009, SIAM J. Optim..

[4]  Yair Carmon,et al.  Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..

[5]  Tengyu Ma,et al.  Finding Approximate Local Minima for Nonconvex Optimization in Linear Time , 2016, ArXiv.

[6]  Nam Ho-Nguyen,et al.  A Second-Order Cone Based Approach for Solving the Trust-Region Subproblem and Its Variants , 2016, SIAM J. Optim..

[7]  Nicholas I. M. Gould,et al.  Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results , 2011, Math. Program..

[8]  Yurii Nesterov,et al.  Squared Functional Systems and Optimization Problems , 2000 .

[9]  Yair Carmon,et al.  Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..

[10]  Michael I. Jordan,et al.  Gradient Descent Converges to Minimizers , 2016, ArXiv.

[11]  Kfir Y. Levy,et al.  The Power of Normalization: Faster Evasion of Saddle Points , 2016, ArXiv.

[12]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[13]  Nicholas I. M. Gould,et al.  Solving the Trust-Region Subproblem using the Lanczos Method , 1999, SIAM J. Optim..

[14]  Nicholas I. M. Gould,et al.  Trust Region Methods , 2000, MOS-SIAM Series on Optimization.

[15]  Le Thi Hoai An,et al.  A D.C. Optimization Algorithm for Solving the Trust-Region Subproblem , 1998, SIAM J. Optim..

[16]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[17]  Georgios Piliouras,et al.  Gradient Descent Converges to Minimizers: The Case of Non-Isolated Critical Points , 2016, ArXiv.

[18]  Cameron Musco,et al.  Randomized Block Krylov Methods for Stronger and Faster Approximate Singular Value Decomposition , 2015, NIPS.

[19]  Elad Hazan,et al.  A linear-time algorithm for trust region problems , 2014, Math. Program..

[20]  Marco Sciandrone,et al.  On the use of iterative methods in cubic regularization for unconstrained optimization , 2015, Comput. Optim. Appl..

[21]  Nicholas I. M. Gould,et al.  Complexity bounds for second-order optimality in unconstrained optimization , 2012, J. Complex..

[22]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[23]  Tengyu Ma,et al.  Finding approximate local minima faster than gradient descent , 2016, STOC.

[24]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[25]  J. Kuczy,et al.  Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start , 1992 .

[26]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.