SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization

There has been extensive research on developing stochastic variance reduced methods to solve large-scale optimization problems. More recently, novel algorithms SARAH \citep{Lam2017a,Lam2017b} and SPIDER \citep{Fang2018} of such a type have been developed. In particular, SPIDER has been shown in \cite{Fang2018} to outperform existing algorithms of the same type and meet the lower bound in certain regimes for nonconvex optimization. Though appealing in theory, SPIDER requires $\epsilon$-level stepsize to guarantee the convergence, and consequently runs slow in practice. This paper proposes SpiderBoost as an improved scheme, which comes with two major advantages compared to SPIDER. First, it allows much larger stepsize without sacrificing the convergence rate, and hence runs substantially faster in practice. Second, it extends much more easily to proximal algorithms with guaranteed convergence for solving composite optimization problems, which appears challenging for SPIDER due to stringent requirement on per-iteration increment to guarantee its convergence. Both advantages can be attributed to the new convergence analysis we develop for SpiderBoost that allows much more flexibility for choosing algorithm parameters. As further generalization of SpiderBoost, we show that proximal SpiderBoost achieves a stochastic first-order oracle (SFO) complexity of $\mathcal{O}(\min \{n^{1/2}\epsilon^{-2},\epsilon^{-3}\})$ for composite optimization, which improves the existing best results by a factor of $\mathcal{O}(\min\{n^{1/6},\epsilon^{-1/3}\})$. Moreover, for nonconvex optimization under the gradient dominance condition and in an online setting, we also obtain improved oracle complexity results compared to the corresponding state-of-the-art results.

[1]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[2]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[3]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[4]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[5]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[6]  Yi Zhou,et al.  Geometrical properties and accelerated gradient solvers of non-convex phase retrieval , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[7]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[8]  Alexander J. Smola,et al.  Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization , 2016, NIPS.

[9]  Zeyuan Allen Zhu,et al.  Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[10]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[11]  Michael I. Jordan,et al.  Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.

[12]  Yi Zhou,et al.  Characterization of Gradient Dominance and Regularity Conditions for Neural Networks , 2017, ArXiv.

[13]  Jie Liu,et al.  Stochastic Recursive Gradient Algorithm for Nonconvex Optimization , 2017, ArXiv.

[14]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[15]  Jian Li,et al.  A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization , 2018, NeurIPS.

[16]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[17]  Zeyuan Allen-Zhu,et al.  Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.

[18]  Quanquan Gu,et al.  Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization , 2018, NeurIPS.

[19]  Hongyi Zhang,et al.  R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate , 2018, ArXiv.

[20]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[21]  Xiaodong Li,et al.  Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization , 2016, Applied and Computational Harmonic Analysis.