论文信息 - Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step

Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step

We consider the minimization of non-convex quadratic forms regularized by a cubic term, which exhibit multiple saddle points and poor local minima. Nonetheless, we prove that, under mild assumptions, gradient descent approximates the $\textit{global minimum}$ to within $\varepsilon$ accuracy in $O(\varepsilon^{-1}\log(1/\varepsilon))$ steps for large $\varepsilon$ and $O(\log(1/\varepsilon))$ steps for small $\varepsilon$ (compared to a condition number we define), with at most logarithmic dependence on the problem dimension. When we use gradient descent to approximate the Nesterov-Polyak cubic-regularized Newton step, our result implies a rate of convergence to second-order stationary points of general smooth non-convex functions.

Yair Carmon | John C. Duchi | Y. Carmon

[1] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[2] Nicholas I. M. Gould,et al. On solving trust-region and other regularised subproblems in optimization , 2010, Math. Program. Comput..

[3] Philip E. Gill,et al. A Subspace Minimization Method for the Trust-Region Step , 2009, SIAM J. Optim..

[4] Yair Carmon,et al. Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..

[5] Tengyu Ma,et al. Finding Approximate Local Minima for Nonconvex Optimization in Linear Time , 2016, ArXiv.

[6] Nam Ho-Nguyen,et al. A Second-Order Cone Based Approach for Solving the Trust-Region Subproblem and Its Variants , 2016, SIAM J. Optim..

[7] Nicholas I. M. Gould,et al. Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results , 2011, Math. Program..

[8] Yurii Nesterov,et al. Squared Functional Systems and Optimization Problems , 2000 .

[9] Yair Carmon,et al. Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..

[10] Michael I. Jordan,et al. Gradient Descent Converges to Minimizers , 2016, ArXiv.

[11] Kfir Y. Levy,et al. The Power of Normalization: Faster Evasion of Saddle Points , 2016, ArXiv.