Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives

Many classical algorithms are found until several years later to outlive the confines in which they were conceived, and continue to be relevant in unforeseen settings. In this paper, we show that SVRG is one such method: being originally designed for strongly convex objectives, it is also very robust in non-strongly convex or sum-of-non-convex settings. More precisely, we provide new analysis to improve the state-of-the-art running times in both settings by either applying SVRG or its novel variant. Since non-strongly convex objectives include important examples such as Lasso or logistic regression, and sum-of-non-convex objectives include famous examples such as stochastic PCA and is even believed to be related to training deep neural nets, our results also imply better performances in these applications.

[1]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[4]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[5]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[6]  Y. Singer,et al.  Logarithmic Regret Algorithms for Strongly Convex Repeated Games , 2007 .

[7]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[8]  Tong Zhang,et al.  Proximal Stochastic Dual Coordinate Ascent , 2012, ArXiv.

[9]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[10]  Rong Jin,et al.  Mixed Optimization for Smooth Functions , 2013, NIPS.

[11]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[12]  Rong Jin,et al.  Linear Convergence with Condition Number Independent Access of Full Gradients , 2013, NIPS.

[13]  Justin Domke,et al.  Finito: A faster, permutable incremental gradient method for big data problems , 2014, ICML.

[14]  S. Shalev-Shwartz,et al.  Stochastic Gradient Descent , 2014 .

[15]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[16]  Zhaosong Lu,et al.  An Accelerated Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2014, 1407.1296.

[17]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[18]  Lin Xiao,et al.  An Accelerated Proximal Coordinate Gradient Method , 2014, NIPS.

[19]  Ohad Shamir,et al.  A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate , 2014, ICML.

[20]  Shai Shalev-Shwartz,et al.  SDCA without Duality , 2015, ArXiv.

[21]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[22]  Sham M. Kakade,et al.  Robust Shift-and-Invert Preconditioning: Faster and More Sample Efficient Algorithms for Eigenvector Computation , 2015, ArXiv.

[23]  Sham M. Kakade,et al.  Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization , 2015, ICML.

[24]  Julien Mairal,et al.  Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[25]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[26]  Mark W. Schmidt,et al.  StopWasting My Gradients: Practical SVRG , 2015, NIPS.

[27]  Elad Hazan,et al.  Fast and Simple PCA via Convex Optimization , 2015, ArXiv.

[28]  Jie Liu,et al.  Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting , 2015, IEEE Journal of Selected Topics in Signal Processing.

[29]  Zeyuan Allen Zhu,et al.  Optimal Black-Box Reductions Between Optimization Objectives , 2016, NIPS.

[30]  Zeyuan Allen Zhu,et al.  Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[31]  Zeyuan Allen-Zhu Katyusha: The First Truly Accelerated Stochastic Gradient Method , 2016 .

[32]  Yuanzhi Li,et al.  Even Faster SVD Decomposition Yet Without Agonizing Pain , 2016, NIPS.

[33]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[34]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[35]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.