论文信息 - Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(log(T)/T), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can get an optimal O(1/T) rate. This might lead one to believe that standard SGD is suboptimal, and maybe should even be replaced as a method of choice. In this paper, we investigate the optimality of SGD in a stochastic setting. We show that for smooth problems, the algorithm attains the optimal O(1/T) rate. However, for non-smooth problems, the convergence rate with averaging might really be Ω(log(T)/T), and this is not just an artifact of the analysis. On the flip side, we show that a simple modification of the averaging step suffices to recover the O(1/T) rate, and no other change of the algorithm is necessary. We also present experimental results which support our findings, and point out open problems.

[1] V. Peña. A General Class of Exponential Inequalities for Martingales and Ratios , 1999 .

[2] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[3] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.

[4] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[5] Thomas P. Hayes,et al. High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.

[6] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[7] Ohad Shamir,et al. Stochastic Convex Optimization , 2009, COLT.

[8] Y. Nesterov,et al. Primal-dual subgradient methods for minimizing uniformly convex functions , 2010, 1401.1792.

[9] Elad Hazan,et al. An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[10] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[11] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..