论文信息 - First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time - 字舞流文

First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time

Two classes of methods have been proposed for escaping from saddle points with one using the second-order information carried by the Hessian and the other adding the noise into the first-order information. The existing analysis for algorithms using noise in the first-order information is quite involved and hides the essence of added noise, which hinder further improvements of these algorithms. In this paper, we present a novel perspective of noise-adding technique, i.e., adding the noise into the first-order information can help extract the negative curvature from the Hessian matrix, and provide a formal reasoning of this perspective by analyzing a simple first-order procedure. More importantly, the proposed procedure enables one to design purely first-order stochastic algorithms for escaping from non-degenerate saddle points with a much better time complexity (almost linear time in terms of the problem's dimensionality). In particular, we develop a {\bf first-order stochastic algorithm} based on our new technique and an existing algorithm that only converges to a first-order stationary point to enjoy a time complexity of {$\widetilde O(d/\epsilon^{3.5})$ for finding a nearly second-order stationary point $\bf{x}$ such that $\|\nabla F(bf{x})\|\leq \epsilon$ and $\nabla^2 F(bf{x})\geq -\sqrt{\epsilon}I$ (in high probability), where $F(\cdot)$ denotes the objective function and $d$ is the dimensionality of the problem. To the best of our knowledge, this is the best theoretical result of first-order algorithms for stochastic non-convex optimization, which is even competitive with if not better than existing stochastic algorithms hinging on the second-order information.

Tianbao Yang | Jing Rong | Yi Xu | Tianbao Yang | Yi Xu | Rong Jin

[1] Yair Carmon,et al. Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..

[2] Stephen J. Wright,et al. Complexity Analysis of Second-Order Line-Search Algorithms for Smooth Nonconvex Optimization , 2017, SIAM J. Optim..

[3] Alexander J. Smola,et al. A Generic Approach for Escaping Saddle points , 2017, AISTATS.

[4] Zeyuan Allen-Zhu,et al. Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.

[5] Yi Yang,et al. A Unified Analysis of Stochastic Momentum Methods for Deep Learning , 2018, IJCAI.

[6] Prateek Jain,et al. Non-convex Optimization for Machine Learning , 2017, Found. Trends Mach. Learn..

[7] Michael O'Neill,et al. Behavior of Accelerated Gradient Methods Near Critical Points of Nonconvex Problems , 2017 .

[8] Aurélien Lucchi,et al. Sub-sampled Cubic Regularization for Non-convex Optimization , 2017, ICML.

[9] Michael I. Jordan,et al. Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.

[10] Mingrui Liu,et al. On Noisy Negative Curvature Descent: Competing with Gradient Descent for Faster Non-convex Optimization , 2017, 1709.08571.

[11] Shai Shalev-Shwartz,et al. On Graduated Optimization for Stochastic Non-Convex Problems , 2015, ICML.

[12] Yuanzhi Li,et al. Neon2: Finding Local Minima via First-Order Oracles , 2017, NeurIPS.

[13] Nicholas I. M. Gould,et al. Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity , 2011, Math. Program..

[14] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.

[15] Moritz Hardt,et al. The Noisy Power Method: A Meta Algorithm with Applications , 2013, NIPS.

[16] Yair Carmon,et al. Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..

[17] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[18] Yuchen Zhang,et al. A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[19] Nicholas I. M. Gould,et al. Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results , 2011, Math. Program..

[20] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[21] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[22] Maria-Florina Balcan,et al. An Improved Gap-Dependency Analysis of the Noisy Power Method , 2016, COLT.

[23] Mingrui Liu,et al. Stochastic Non-convex Optimization with Strong High Probability Second-order Convergence , 2017, ArXiv.

[24] Yair Carmon,et al. "Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions , 2017, ICML.

[25] Huan Li,et al. Accelerated Proximal Gradient Methods for Nonconvex Programming , 2015, NIPS.

[26] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.

[27] Peng Xu,et al. Newton-type methods for non-convex optimization under inexact Hessian information , 2017, Math. Program..

[28] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[29] Kfir Y. Levy,et al. The Power of Normalization: Faster Evasion of Saddle Points , 2016, ArXiv.

[30] Saeed Ghadimi,et al. Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[31] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.

[32] Michael I. Jordan,et al. Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.

[33] J. Kuczy,et al. Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start , 1992 .

[34] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[35] Saeed Ghadimi,et al. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[36] Henryk Wozniakowski,et al. Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start , 1992, SIAM J. Matrix Anal. Appl..