论文信息 - A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets - 字舞流文

A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets

We propose a new stochastic gradient method for optimizing the sum of a nite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms, both in terms of optimizing the training objective and reducing the testing objective quickly.

Mark W. Schmidt | Nicolas Le Roux | Francis R. Bach | F. Bach

[1] Carlos S. Kubrusly,et al. Stochastic approximation algorithms and applications , 1973, CDC 1973.

[2] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[3] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .

[4] Bernard Delyon,et al. Accelerated Stochastic Approximation , 1993, SIAM J. Optim..

[5] Alexander J. Smola,et al. Neural Information Processing Systems , 1997, NIPS 1997.

[6] Dimitri P. Bertsekas,et al. A New Class of Incremental Gradient Methods for Least Squares Problems , 1997, SIAM J. Optim..

[7] Paul Tseng,et al. An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule , 1998, SIAM J. Optim..

[8] Mikhail V. Solodov,et al. Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero , 1998, Comput. Optim. Appl..

[9] D. Bertsekas,et al. Convergen e Rate of In remental Subgradient Algorithms , 2000 .

[10] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.

[11] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[12] Thorsten Joachims,et al. KDD-Cup 2004: results and analysis , 2004, SKDD.

[13] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[14] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .

[15] S. Rosset,et al. Piecewise linear regularized solution paths , 2007, 0708.2197.

[16] Alfred O. Hero,et al. A Convergent Incremental Gradient Method with a Constant Step Size , 2007, SIAM J. Optim..

[17] H. Robbins. A Stochastic Approximation Method , 1951 .

[18] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.

[19] Alexander J. Smola,et al. A scalable modular convex solver for regularized risk minimization , 2007, KDD '07.

[20] Nathan Srebro,et al. Fast Rates for Regularized Objectives , 2008, NIPS.

[21] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[22] S. V. N. Vishwanathan,et al. Variable Metric Stochastic Approximation Theory , 2009, AISTATS.

[23] Martin J. Wainwright,et al. Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.

[24] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[25] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[26] Michael I. Jordan,et al. Asymptotically Optimal Regularization in Smooth Parametric Models , 2009, NIPS.

[27] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.

[28] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[29] Ingo Steinwart,et al. Optimal learning rates for least squares SVMs using Gaussian kernels , 2011, NIPS.

[30] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[31] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[32] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[33] Mark W. Schmidt,et al. Hybrid Deterministic-Stochastic Methods for Data Fitting , 2011, SIAM J. Sci. Comput..