Cubic Regularization with Momentum for Nonconvex Optimization

Momentum is a popular technique to accelerate the convergence in practical training, and its impact on convergence guarantee has been well-studied for first-order algorithms. However, such a successful acceleration technique has not yet been proposed for second-order algorithms in nonconvex optimization.In this paper, we apply the momentum scheme to cubic regularized (CR) Newton's method and explore the potential for acceleration. Our numerical experiments on various nonconvex optimization problems demonstrate that the momentum scheme can substantially facilitate the convergence of cubic regularization, and perform even better than the Nesterov's acceleration scheme for CR. Theoretically, we prove that CR under momentum achieves the best possible convergence rate to a second-order stationary point for nonconvex optimization. Moreover, we study the proposed algorithm for solving problems satisfying an error bound condition and establish a local quadratic convergence rate. Then, particularly for finite-sum problems, we show that the proposed algorithm can allow computational inexactness that reduces the overall sample complexity without degrading the convergence rate.

[1]  Tie-Yan Liu,et al.  Efficient Inexact Proximal Gradient Algorithm for Nonconvex Problems , 2016, IJCAI.

[2]  Tianbao Yang,et al.  NEON+: Accelerated Gradient Methods for Extracting Negative Curvature for Non-Convex Optimization , 2017, 1712.01033.

[3]  Anthony Man-Cho So,et al.  On the Quadratic Convergence of the Cubic Regularization Method under a Local Error Bound Condition , 2019, SIAM J. Optim..

[4]  Yair Carmon,et al.  Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..

[5]  Anthony Man-Cho So,et al.  A family of inexact SQA methods for non-smooth convex minimization with provable convergence guarantees based on the Luo–Tseng error bound property , 2016, Math. Program..

[6]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[7]  Michael I. Jordan,et al.  Minimizing Nonconvex Population Risk from Rough Empirical Risk , 2018, ArXiv.

[8]  Tengyu Ma,et al.  Finding approximate local minima faster than gradient descent , 2016, STOC.

[9]  Yingbin Liang,et al.  A Nonconvex Approach for Phase Retrieval: Reshaped Wirtinger Flow and Incremental Algorithms , 2017, J. Mach. Learn. Res..

[10]  Quanquan Gu,et al.  Stochastic Variance-Reduced Cubic Regularized Newton Method , 2018, ICML.

[11]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[12]  Nicholas I. M. Gould,et al.  Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity , 2011, Math. Program..

[13]  Yi Zhou,et al.  Convergence of Cubic Regularization for Nonconvex Optimization under KL Property , 2018, NeurIPS.

[14]  Yi Zhou,et al.  Sample Complexity of Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization , 2018, AISTATS.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[17]  Yurii Nesterov,et al.  Accelerating the cubic regularization of Newton’s method on convex problems , 2005, Math. Program..

[18]  Saeed Ghadimi,et al.  Second-Order Methods with Cubic Regularization Under Inexact Information , 2017, 1710.05782.

[19]  John Wright,et al.  Complete Dictionary Recovery Using Nonconvex Optimization , 2015, ICML.

[20]  Nicholas I. M. Gould,et al.  Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results , 2011, Math. Program..

[21]  Aurélien Lucchi,et al.  Sub-sampled Cubic Regularization for Non-convex Optimization , 2017, ICML.

[22]  Mingrui Liu,et al.  On Noisy Negative Curvature Descent: Competing with Gradient Descent for Faster Non-convex Optimization , 2017, 1709.08571.

[23]  Michael I. Jordan,et al.  Stochastic Cubic Regularization for Fast Nonconvex Optimization , 2017, NeurIPS.

[24]  Yingbin Liang,et al.  A note on inexact gradient and Hessian conditions for cubic regularized Newton's method , 2019, Oper. Res. Lett..

[25]  Masao Fukushima,et al.  Regularized Newton Methods for Convex Minimization Problems with Singular Solutions , 2004, Comput. Optim. Appl..

[26]  M. Fukushima,et al.  On the Rate of Convergence of the Levenberg-Marquardt Method , 2001 .

[27]  Huan Li,et al.  Accelerated Proximal Gradient Methods for Nonconvex Programming , 2015, NIPS.

[28]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[29]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[30]  Tianyi Lin,et al.  A unified scheme to accelerate adaptive cubic regularization and gradient methods for convex optimization , 2017, 1710.04788.

[31]  Peng Xu,et al.  Newton-type methods for non-convex optimization under inexact Hessian information , 2017, Math. Program..

[32]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[33]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[34]  Michael I. Jordan,et al.  Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.

[35]  Yair Carmon,et al.  Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step , 2016, ArXiv.

[36]  Ya-Xiang Yuan,et al.  On the Quadratic Convergence of the Levenberg-Marquardt Method without Nonsingularity Assumption , 2005, Computing.

[37]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[38]  Pramod K. Varshney,et al.  Convergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization , 2017, ICML.

[39]  Michael I. Jordan,et al.  On the Local Minima of the Empirical Risk , 2018, NeurIPS.