Stochastic Variance-Reduced Cubic Regularization Methods

We propose a stochastic variance-reduced cubic regularized Newton method (SVRC) for non-convex optimization. At the core of SVRC is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularization method. For a nonconvex function with n component functions, we show that our algorithm is guaranteed to converge to an ( , √ )-approximate local minimum within Õ(n/ ) second-order oracle calls, which outperforms the state-of-the-art cubic regularization algorithms including subsampled cubic regularization. To further reduce the sample complexity of Hessian matrix computation in cubic regularization based methods, we also propose a sample efficient stochastic variance-reduced cubic regularization (Lite-SVRC) algorithm for finding the local minimum more efficiently. Lite-SVRC converges to an ( , √ )-approximate local minimum within Õ(n + n/ ) Hessian sample complexity, which is faster than all existing cubic regularization based methods. Numerical experiments with different nonconvex optimization problems conducted on real datasets validate our theoretical results for both SVRC and Lite-SVRC.

[1]  Yair Carmon,et al.  Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..

[2]  José Mario Martínez,et al.  Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization , 2017, J. Glob. Optim..

[3]  Shuzhong Zhang,et al.  Adaptive Stochastic Variance Reduction for Subsampled Newton Method with Cubic Regularization , 2018, INFORMS Journal on Optimization.

[4]  Peng Xu,et al.  Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study , 2017, SDM.

[5]  Tianbao Yang,et al.  First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time , 2017, NeurIPS.

[6]  Yaodong Yu,et al.  Saving Gradient and Negative Curvature Computations: Finding Local Minima More Efficiently , 2017, ArXiv.

[7]  Jinghui Chen,et al.  Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[8]  Weixin Yao,et al.  Robust linear regression: A review and comparison , 2014, Commun. Stat. Simul. Comput..

[9]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[10]  Zeyuan Allen-Zhu,et al.  Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.

[11]  Quanquan Gu,et al.  Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization , 2018, NeurIPS.

[12]  Wei Shi,et al.  Curvature-aided incremental aggregated gradient method , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[13]  Yair Carmon,et al.  Analysis of Krylov Subspace Solutions of Regularized Non-Convex Quadratic Problems , 2018, NeurIPS.

[14]  Richard Y. Chen,et al.  The Masked Sample Covariance Estimator: An Analysis via Matrix Concentration Inequalities , 2011, 1109.1637.

[15]  Zeyuan Allen Zhu,et al.  Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[16]  Nicholas I. M. Gould,et al.  Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity , 2011, Math. Program..

[17]  Alexander J. Smola,et al.  Fast incremental method for smooth nonconvex optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[18]  Thomas Hofmann,et al.  A Variance Reduced Stochastic Newton Method , 2015, ArXiv.

[19]  Nicholas I. M. Gould,et al.  On the Evaluation Complexity of Cubic Regularization Methods for Potentially Rank-Deficient Nonlinear Least-Squares Problems and Its Relevance to Constrained Nonlinear Optimization , 2013, SIAM J. Optim..

[20]  Anton Rodomanov,et al.  A Superlinearly-Convergent Proximal Newton-type Method for the Optimization of Finite Sums , 2016, ICML.

[21]  Daniel P. Robinson,et al.  A trust region algorithm with a worst-case iteration complexity of O(ϵ-3/2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{docume , 2016, Mathematical Programming.

[22]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[23]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[24]  Michael I. Jordan,et al.  Matrix concentration inequalities via the method of exchangeable pairs , 2012, 1201.6002.

[25]  Quanquan Gu,et al.  Stochastic Variance-Reduced Cubic Regularized Newton Method , 2018, ICML.

[26]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[27]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[28]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[29]  Nicholas I. M. Gould,et al.  Complexity bounds for second-order optimality in unconstrained optimization , 2012, J. Complex..

[30]  Michael I. Jordan,et al.  A Linearly-Convergent Stochastic L-BFGS Algorithm , 2015, AISTATS.

[31]  Nicolas Le Roux,et al.  Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods , 2017, AISTATS.

[32]  A. A. Bennett Newton's Method in General Analysis. , 1916, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Aurélien Lucchi,et al.  Sub-sampled Cubic Regularization for Non-convex Optimization , 2017, ICML.

[34]  Michael I. Jordan,et al.  Stochastic Cubic Regularization for Fast Nonconvex Optimization , 2017, NeurIPS.

[35]  Yi Zhou,et al.  Sample Complexity of Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization , 2018, AISTATS.

[36]  Yaodong Yu,et al.  Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima , 2018, NeurIPS.

[37]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[38]  Nicholas I. M. Gould,et al.  Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results , 2011, Math. Program..

[39]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[40]  Peng Xu,et al.  Sub-sampled Newton Methods with Non-uniform Sampling , 2016, NIPS.

[41]  Christopher J. Hillar,et al.  Most Tensor Problems Are NP-Hard , 2009, JACM.

[42]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[43]  J. Tropp The Expected Norm of a Sum of Independent Random Matrices: An Elementary Approach , 2015, 1506.04711.

[44]  Shai Shalev-Shwartz,et al.  SDCA without Duality, Regularization, and Individual Convexity , 2016, ICML.

[45]  Yuanzhi Li,et al.  Neon2: Finding Local Minima via First-Order Oracles , 2017, NeurIPS.

[46]  Tengyu Ma,et al.  Finding approximate local minima faster than gradient descent , 2016, STOC.

[47]  Peng Xu,et al.  Newton-type methods for non-convex optimization under inexact Hessian information , 2017, Math. Program..

[48]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[49]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[50]  Elad Hazan,et al.  Fast and Simple PCA via Convex Optimization , 2015, ArXiv.

[51]  P. Toint,et al.  Trust-region and other regularisations of linear least-squares problems , 2009 .

[52]  J. Blanchet,et al.  Convergence Rate Analysis of a Stochastic Trust Region Method for Nonconvex Optimization , 2016 .

[53]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[54]  Stephen J. Wright,et al.  Complexity Analysis of Second-Order Line-Search Algorithms for Smooth Nonconvex Optimization , 2017, SIAM J. Optim..

[55]  Michael I. Jordan,et al.  Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.

[56]  Yair Carmon,et al.  Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step , 2016, ArXiv.

[57]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[58]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .