论文信息 - Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization - 字舞流文

Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization

We develop a new accelerated stochastic gradient method for efficiently solving the convex regularized empirical risk minimization problem in mini-batch settings. The use of mini-batches has become a golden standard in the machine learning community, because the mini-batch techniques stabilize the gradient estimate and can easily make good use of parallel computing. The core of our proposed method is the incorporation of our new ``double acceleration'' technique and variance reduction technique. We theoretically analyze our proposed method and show that our method much improves the mini-batch efficiencies of previous accelerated stochastic methods, and essentially only needs size $\sqrt{n}$ mini-batches for achieving the optimal iteration complexities for both non-strongly and strongly convex objectives, where $n$ is the training set size. Further, we show that even in non-mini-batch settings, our method achieves the best known convergence rate for non-strongly convex and strongly convex objectives.

Taiji Suzuki | Tomoya Murata | Taiji Suzuki | Tomoya Murata

[1] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[2] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[3] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[4] Jie Liu,et al. Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting , 2015, IEEE Journal of Selected Topics in Signal Processing.

[5] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[6] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[7] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[8] Y. Nesterov. Gradient methods for minimizing composite objective function , 2007 .

[9] Atsushi Nitanda,et al. Stochastic Proximal Gradient Descent with Acceleration Techniques , 2014, NIPS.

[10] Yuchen Zhang,et al. Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[11] Zeyuan Allen Zhu,et al. UniVR: A Universal Variance Reduction Framework for Proximal Stochastic Gradient Method , 2015, ArXiv.

[12] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[13] Huan Li,et al. Accelerated Proximal Gradient Methods for Nonconvex Programming , 2015, NIPS.

[14] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[15] Martin J. Wainwright,et al. Fast global convergence rates of gradient methods for high-dimensional statistical recovery , 2010, NIPS.

[16] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[17] Emmanuel J. Candès,et al. Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.

[18] Lin Xiao,et al. An Accelerated Proximal Coordinate Gradient Method , 2014, NIPS.

[19] Nathan Srebro,et al. Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[20] Lin Xiao,et al. An adaptive accelerated proximal gradient method and its homotopy continuation for sparse optimization , 2014, Computational Optimization and Applications.

[21] Zeyuan Allen Zhu,et al. Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives , 2015, ICML.

[22] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[23] Martin J. Wainwright,et al. Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions , 2012, 2014 48th Annual Conference on Information Sciences and Systems (CISS).

[24] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[25] Atsushi Nitanda,et al. Accelerated Stochastic Gradient Descent for Minimizing Finite Sums , 2015, AISTATS.

[26] Y. Singer,et al. Logarithmic Regret Algorithms for Strongly Convex Repeated Games , 2007 .

[27] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[28] Yoram Singer,et al. Efficient Learning using Forward-Backward Splitting , 2009, NIPS.

[29] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..