论文信息 - Non-convex Finite-Sum Optimization Via SCSG Methods - 字舞流文

Non-convex Finite-Sum Optimization Via SCSG Methods

We develop a class of algorithms, as variants of the stochastically controlled stochastic gradient (SCSG) methods (Lei and Jordan, 2016), for the smooth non-convex finite-sum optimization problem. Assuming the smoothness of each component, the complexity of SCSG to reach a stationary point with $\mathbb{E} \|\nabla f(x)\|^{2}\le \epsilon$ is $O\left (\min\{\epsilon^{-5/3}, \epsilon^{-1}n^{2/3}\}\right)$, which strictly outperforms the stochastic gradient descent. Moreover, SCSG is never worse than the state-of-the-art methods based on variance reduction and it significantly outperforms them when the target accuracy is low. A similar acceleration is also achieved when the functions satisfy the Polyak-Lojasiewicz condition. Empirical experiments demonstrate that SCSG outperforms stochastic gradient methods on training multi-layers neural networks in terms of both training and validation loss.

Michael I. Jordan | Lihua Lei | Cheng Ju | Jianbo Chen | Lihua Lei | Jianbo Chen | Cheng Ju

[1] Paul Tseng,et al. An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule , 1998, SIAM J. Optim..

[2] Alexander J. Smola,et al. Fast Incremental Method for Nonconvex Optimization , 2016, ArXiv.

[3] Eric R. Ziegel,et al. Generalized Linear Models , 2002, Technometrics.

[4] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6] Michael I. Jordan,et al. Less than a Single Pass: Stochastically Controlled Stochastic Gradient , 2016, AISTATS.

[7] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[8] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[9] Zeyuan Allen Zhu,et al. Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives , 2015, ICML.

[10] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[11] Mark W. Schmidt,et al. StopWasting My Gradients: Practical SVRG , 2015, NIPS.

[12] Dimitri P. Bertsekas,et al. A New Class of Incremental Gradient Methods for Least Squares Problems , 1997, SIAM J. Optim..

[13] Alexei A. Gaivoronski,et al. Convergence properties of backpropagation for neural nets via theory of stochastic gradient methods. Part 1 , 1994 .

[14] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[15] Léon Bottou,et al. A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[16] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.

[17] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[18] Zeyuan Allen-Zhu. Natasha: Faster Stochastic Non-Convex Optimization via Strongly Non-Convex Parameter , 2017 .

[19] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[20] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[21] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[22] Zeyuan Allen-Zhu,et al. Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter , 2017, ICML.

[23] Yair Carmon,et al. "Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions , 2017, ICML.

[24] Saeed Ghadimi,et al. Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[25] Zeyuan Allen Zhu,et al. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[26] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[27] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[28] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..

[29] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[30] Yair Carmon,et al. Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..

[31] Tengyu Ma,et al. Finding Approximate Local Minima for Nonconvex Optimization in Linear Time , 2016, ArXiv.