论文信息 - Iterate averaging as regularization for stochastic gradient descent

Iterate averaging as regularization for stochastic gradient descent

We propose and analyze a variant of the classic Polyak-Ruppert averaging scheme, broadly used in stochastic gradient methods. Rather than a uniform average of the iterates, we consider a weighted average, with weights decaying in a geometric fashion. In the context of linear least squares regression, we show that this averaging scheme has a the same regularizing effect, and indeed is asymptotically equivalent, to ridge regression. In particular, we derive finite-sample bounds for the proposed approach that match the best known results for regularized stochastic gradient methods.

Lorenzo Rosasco | Gergely Neu | L. Rosasco | Gergely Neu

[1] Yuan Yao,et al. Online Learning Algorithms , 2006, Found. Comput. Math..

[2] L. Györfi,et al. On the Averaged Stochastic Approximation for Linear Regression , 1996 .

[3] Francis R. Bach,et al. Constant Step Size Least-Mean-Square: Bias-Variance Trade-offs and Optimal Sampling Distributions , 2014, AISTATS 2014.

[4] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .

[5] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .

[6] Elad Hazan,et al. An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[7] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[8] Prateek Jain,et al. A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares) , 2017, FSTTCS.

[9] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[10] Prateek Jain,et al. Parallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging , 2016, ArXiv.

[11] H. Robbins. A Stochastic Approximation Method , 1951 .

[12] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[13] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[14] Yuan Yao,et al. Online Learning as Stochastic Approximation of Regularization Paths: Optimality and Almost-Sure Convergence , 2011, IEEE Transactions on Information Theory.

[15] Manfred K. Warmuth,et al. Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17] A. Caponnetto,et al. Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[18] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[19] Ohad Shamir,et al. Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.

[20] Lorenzo Rosasco,et al. Learning with Incremental Iterative Regularization , 2014, NIPS.

[21] Csaba Szepesvari,et al. Regularization in reinforcement learning , 2011 .

[22] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[23] Massimiliano Pontil,et al. Online Gradient Descent Learning Algorithms , 2008, Found. Comput. Math..

[24] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[25] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.

[26] H. Fleming. Equivalence of regularization and truncated iteration in the solution of III-posed image reconstruction problems , 1990 .

[27] Csaba Szepesvári,et al. Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? , 2018, AISTATS.

[28] V. Vovk. Competitive On‐line Statistics , 2001 .

[29] Lorenzo Rosasco,et al. Optimal Rates for Multi-pass Stochastic Gradient Methods , 2016, J. Mach. Learn. Res..

[30] Y. Yao,et al. On Early Stopping in Gradient Descent Learning , 2007 .

[31] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[32] Lorenzo Rosasco,et al. Model Selection for Regularized Least-Squares Algorithm in Learning Theory , 2005, Found. Comput. Math..