A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets

We propose a new stochastic gradient method for optimizing the sum of a nite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms, both in terms of optimizing the training objective and reducing the testing objective quickly.

[1]  Carlos S. Kubrusly,et al.  Stochastic approximation algorithms and applications , 1973, CDC 1973.

[2]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[3]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[4]  Bernard Delyon,et al.  Accelerated Stochastic Approximation , 1993, SIAM J. Optim..

[5]  Alexander J. Smola,et al.  Neural Information Processing Systems , 1997, NIPS 1997.

[6]  Dimitri P. Bertsekas,et al.  A New Class of Incremental Gradient Methods for Least Squares Problems , 1997, SIAM J. Optim..

[7]  Paul Tseng,et al.  An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule , 1998, SIAM J. Optim..

[8]  Mikhail V. Solodov,et al.  Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero , 1998, Comput. Optim. Appl..

[9]  D. Bertsekas,et al.  Convergen e Rate of In remental Subgradient Algorithms , 2000 .

[10]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[11]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[12]  Thorsten Joachims,et al.  KDD-Cup 2004: results and analysis , 2004, SKDD.

[13]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[14]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[15]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[16]  Alfred O. Hero,et al.  A Convergent Incremental Gradient Method with a Constant Step Size , 2007, SIAM J. Optim..

[17]  H. Robbins A Stochastic Approximation Method , 1951 .

[18]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[19]  Alexander J. Smola,et al.  A scalable modular convex solver for regularized risk minimization , 2007, KDD '07.

[20]  Nathan Srebro,et al.  Fast Rates for Regularized Objectives , 2008, NIPS.

[21]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[22]  S. V. N. Vishwanathan,et al.  Variable Metric Stochastic Approximation Theory , 2009, AISTATS.

[23]  Martin J. Wainwright,et al.  Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.

[24]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[25]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[26]  Michael I. Jordan,et al.  Asymptotically Optimal Regularization in Smooth Parametric Models , 2009, NIPS.

[27]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[28]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[29]  Ingo Steinwart,et al.  Optimal learning rates for least squares SVMs using Gaussian kernels , 2011, NIPS.

[30]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[31]  John C. Duchi,et al.  Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[32]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[33]  Mark W. Schmidt,et al.  Hybrid Deterministic-Stochastic Methods for Data Fitting , 2011, SIAM J. Sci. Comput..