Momentum Schemes with Stochastic Variance Reduction for Nonconvex Composite Optimization

Two new stochastic variance-reduced algorithms named SARAH and SPIDER have been recently proposed, and SPIDER has been shown to achieve a near-optimal gradient oracle complexity for nonconvex optimization. However, the theoretical advantage of SPIDER does not lead to substantial improvement of practical performance over SVRG. To address this issue, momentum technique can be a good candidate to improve the performance of SPIDER. However, existing momentum schemes used in variance-reduced algorithms are designed specifically for convex optimization, and are not applicable to nonconvex scenarios. In this paper, we develop novel momentum schemes with flexible coefficient settings to accelerate SPIDER for nonconvex and nonsmooth composite optimization, and show that the resulting algorithms achieve the near-optimal gradient oracle complexity for achieving a generalized first-order stationary condition. Furthermore, we generalize our algorithm to online nonconvex and nonsmooth optimization, and establish an oracle complexity result that matches the state-of-the-art. Our extensive experiments demonstrate the superior performance of our proposed algorithm over other stochastic variance-reduced algorithms.

[1]  Yi Zhou,et al.  Cubic Regularization with Momentum for Nonconvex Optimization , 2018, UAI.

[2]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[3]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[4]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[5]  Yi Zhou,et al.  SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization , 2018, ArXiv.

[6]  Paul Tseng,et al.  Approximation accuracy, gradient methods, and error bound for structured convex optimization , 2010, Math. Program..

[7]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Zeyuan Allen Zhu,et al.  Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[10]  Jie Liu,et al.  Stochastic Recursive Gradient Algorithm for Nonconvex Optimization , 2017, ArXiv.

[11]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[12]  Zeyuan Allen Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.

[13]  Pramod K. Varshney,et al.  Convergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization , 2017, ICML.

[14]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[15]  Hongyi Zhang,et al.  R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate , 2018, ArXiv.

[16]  Yan Ren,et al.  ASVRG: Accelerated Proximal SVRG , 2018, ACML.

[17]  Quanquan Gu,et al.  Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization , 2018, NeurIPS.

[18]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[19]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[20]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[21]  Pan Zhou,et al.  Faster First-Order Methods for Stochastic Non-Convex Optimization on Riemannian Manifolds , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Emmanuel J. Candès,et al.  Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.

[23]  Atsushi Nitanda,et al.  Accelerated Stochastic Gradient Descent for Minimizing Finite Sums , 2015, AISTATS.

[24]  Alexander J. Smola,et al.  Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization , 2016, NIPS.

[25]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[26]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[27]  H. Robbins A Stochastic Approximation Method , 1951 .

[28]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.