论文信息 - Accelerate Stochastic Subgradient Method by Leveraging Local Error Bound - 字舞流文

Accelerate Stochastic Subgradient Method by Leveraging Local Error Bound

In this paper, we propose two accelerated stochastic {\bf subgradient} methods for stochastic non-strongly convex optimization problems by leveraging a generic local error bound condition. The novelty of the proposed methods lies at smartly leveraging the recent historical solution to tackle the variance in the stochastic subgradient. The key idea of both methods is to iteratively solve the original problem approximately in a local region around a recent historical solution with size of the local region gradually decreasing as the solution approaches the optimal set. The difference of the two methods lies at how to construct the local region. The first method uses an explicit ball constraint and the second method uses an implicit regularization approach. For both methods, we establish the improved iteration complexity in a high probability for achieving an $\epsilon$-optimal solution. Besides the improved order of iteration complexity with a high probability, the proposed algorithms also enjoy a logarithmic dependence on the distance of the initial solution to the optimal set. When applied to the $\ell_1$ regularized polyhedral loss minimization (e.g., hinge loss, absolute loss), the proposed stochastic methods have a logarithmic iteration complexity.

Tianbao Yang | Yi Xu | Qihang Lin | Tianbao Yang | Yi Xu | Qihang Lin

[1] Ambuj Tewari,et al. Composite objective mirror descent , 2010, COLT 2010.

[2] Tianbao Yang,et al. RSG: Beating SGD without Smoothness and/or Strong Convexity , 2015 .

[3] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[4] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..

[5] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[6] Ambuj Tewari,et al. On the Generalization Ability of Online Strongly Convex Programming Algorithms , 2008, NIPS.

[7] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.

[8] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[9] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[10] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[11] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .

[12] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[13] Rong Jin,et al. Linear Convergence with Condition Number Independent Access of Full Gradients , 2013, NIPS.

[14] Tianbao Yang,et al. An efficient primal dual prox method for non-smooth optimization , 2014, Machine Learning.

[15] Guoyin Li,et al. Global error bounds for piecewise convex polynomials , 2013, Math. Program..

[16] Elad Hazan,et al. An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[17] Peter L. Bartlett,et al. Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[18] Bruce W. Suter,et al. From error bounds to the complexity of first-order descent methods for convex functions , 2015, Math. Program..

[19] Lorenzo Rosasco,et al. Are Loss Functions All the Same? , 2004, Neural Computation.

[20] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[21] J. Renegar. Efficient First-Order Methods for Linear Programming and Semidefinite Programming , 2014, 1409.5832.

[22] 丸山徹. Convex Analysisの二,三の進展について , 1977 .

[23] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.