论文信息 - Stochastic Cubic Regularization for Fast Nonconvex Optimization - 字舞流文

Stochastic Cubic Regularization for Fast Nonconvex Optimization

This paper proposes a stochastic variant of a classic algorithm---the cubic-regularized Newton method [Nesterov and Polyak 2006]. The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only $\mathcal{\tilde{O}}(\epsilon^{-3.5})$ stochastic gradient and stochastic Hessian-vector product evaluations. The latter can be computed as efficiently as stochastic gradients. This improves upon the $\mathcal{\tilde{O}}(\epsilon^{-4})$ rate of stochastic gradient descent. Our rate matches the best-known result for finding local minima without requiring any delicate acceleration or variance-reduction techniques.

Michael I. Jordan | Nilesh Tripuraneni | Chi Jin | Mitchell Stern | Jeffrey Regier | Chi Jin | Nilesh Tripuraneni | J. Regier | Mitchell Stern

[1] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[2] Magnus Rattray,et al. Natural gradient descent for on-line learning , 1998 .

[3] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[4] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[5] H. Robbins. A Stochastic Approximation Method , 1951 .

[6] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[7] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.

[8] Nicholas I. M. Gould,et al. Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity , 2011, Math. Program..

[9] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[10] Gábor Lugosi,et al. Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[11] Joel A. Tropp,et al. An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..

[12] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[13] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[14] John Wright,et al. A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[15] Prateek Jain,et al. Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm , 2016, COLT.

[16] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[17] Yair Carmon,et al. Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..

[18] Yair Carmon,et al. Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step , 2016, ArXiv.

[19] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[20] John Wright,et al. Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.

[21] Daniel P. Robinson,et al. A trust region algorithm with a worst-case iteration complexity of O(ϵ-3/2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{docume , 2016, Mathematical Programming.

[22] Aurélien Lucchi,et al. Sub-sampled Cubic Regularization for Non-convex Optimization , 2017, ICML.

[23] Michael I. Jordan,et al. Fast Black-box Variational Inference through Stochastic Trust-Region Optimization , 2017, NIPS.

[24] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.

[25] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.

[26] Stephen J. Wright,et al. Complexity Analysis of Second-Order Line-Search Algorithms for Smooth Nonconvex Optimization , 2017, SIAM J. Optim..

[27] Alexander J. Smola,et al. A Generic Approach for Escaping Saddle points , 2017, AISTATS.

[28] Zeyuan Allen-Zhu,et al. Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.

[29] Tianbao Yang,et al. First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time , 2017, NeurIPS.

[30] Yuanzhi Li,et al. Neon2: Finding Local Minima via First-Order Oracles , 2017, NeurIPS.

[31] Peng Xu,et al. Newton-type methods for non-convex optimization under inexact Hessian information , 2017, Math. Program..