论文信息 - Stochastic Variance-Reduced Cubic Regularization Methods

Stochastic Variance-Reduced Cubic Regularization Methods

We propose a stochastic variance-reduced cubic regularized Newton method (SVRC) for non-convex optimization. At the core of SVRC is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularization method. For a nonconvex function with n component functions, we show that our algorithm is guaranteed to converge to an ( , √ )-approximate local minimum within Õ(n/ ) second-order oracle calls, which outperforms the state-of-the-art cubic regularization algorithms including subsampled cubic regularization. To further reduce the sample complexity of Hessian matrix computation in cubic regularization based methods, we also propose a sample efficient stochastic variance-reduced cubic regularization (Lite-SVRC) algorithm for finding the local minimum more efficiently. Lite-SVRC converges to an ( , √ )-approximate local minimum within Õ(n + n/ ) Hessian sample complexity, which is faster than all existing cubic regularization based methods. Numerical experiments with different nonconvex optimization problems conducted on real datasets validate our theoretical results for both SVRC and Lite-SVRC.

[1] Yair Carmon,et al. Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..

[2] José Mario Martínez,et al. Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization , 2017, J. Glob. Optim..

[3] Shuzhong Zhang,et al. Adaptive Stochastic Variance Reduction for Subsampled Newton Method with Cubic Regularization , 2018, INFORMS Journal on Optimization.

[4] Peng Xu,et al. Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study , 2017, SDM.

[5] Tianbao Yang,et al. First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time , 2017, NeurIPS.

[6] Yaodong Yu,et al. Saving Gradient and Negative Curvature Computations: Finding Local Minima More Efficiently , 2017, ArXiv.

[7] Jinghui Chen,et al. Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[8] Weixin Yao,et al. Robust linear regression: A review and comparison , 2014, Commun. Stat. Simul. Comput..

[9] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[10] Zeyuan Allen-Zhu,et al. Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.

[11] Quanquan Gu,et al. Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization , 2018, NeurIPS.

[12] Wei Shi,et al. Curvature-aided incremental aggregated gradient method , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[13] Yair Carmon,et al. Analysis of Krylov Subspace Solutions of Regularized Non-Convex Quadratic Problems , 2018, NeurIPS.

[14] Richard Y. Chen,et al. The Masked Sample Covariance Estimator: An Analysis via Matrix Concentration Inequalities , 2011, 1109.1637.

[15] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[16] Nicholas I. M. Gould,et al. Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity , 2011, Math. Program..

[17] Alexander J. Smola,et al. Fast incremental method for smooth nonconvex optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[18] Thomas Hofmann,et al. A Variance Reduced Stochastic Newton Method , 2015, ArXiv.

[19] Nicholas I. M. Gould,et al. On the Evaluation Complexity of Cubic Regularization Methods for Potentially Rank-Deficient Nonlinear Least-Squares Problems and Its Relevance to Constrained Nonlinear Optimization , 2013, SIAM J. Optim..

[20] Anton Rodomanov,et al. A Superlinearly-Convergent Proximal Newton-type Method for the Optimization of Finite Sums , 2016, ICML.

[21] Daniel P. Robinson,et al. A trust region algorithm with a worst-case iteration complexity of O(ϵ-3/2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{docume , 2016, Mathematical Programming.

[22] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[23] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[24] Michael I. Jordan,et al. Matrix concentration inequalities via the method of exchangeable pairs , 2012, 1201.6002.