论文信息 - Accelerated Variance Reduced Block Coordinate Descent - 字舞流文

Accelerated Variance Reduced Block Coordinate Descent

Algorithms with fast convergence, small number of data access, and low per-iteration complexity are particularly favorable in the big data era, due to the demand for obtaining \emph{highly accurate solutions} to problems with \emph{a large number of samples} in \emph{ultra-high} dimensional space. Existing algorithms lack at least one of these qualities, and thus are inefficient in handling such big data challenge. In this paper, we propose a method enjoying all these merits with an accelerated convergence rate $O(\frac{1}{k^2})$. Empirical studies on large scale datasets with more than one million features are conducted to show the effectiveness of our methods in practice.

Chao Zhang | Zebang Shen | Tengfei Zhou | Hui Qian

[1] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[2] Zeyuan Allen Zhu,et al. Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives , 2015, ICML.

[3] Peter Richtárik,et al. Semi-Stochastic Gradient Descent Methods , 2013, Front. Appl. Math. Stat..

[4] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[5] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[6] Zeyuan Allen Zhu,et al. Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling , 2015, ICML.

[7] Stephen J. Wright. Coordinate descent algorithms , 2015, Mathematical Programming.

[8] Peter Richtárik,et al. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[9] Alexander J. Smola,et al. On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants , 2015, NIPS.

[10] Cuong V Nguyen,et al. Accelerated Stochastic Mirror Descent Algorithms For Composite Non-strongly Convex Optimization , 2016 .

[11] Zeyuan Allen Zhu,et al. Katyusha: Accelerated Variance Reduction for Faster SGD , 2016, ArXiv.

[12] Peter Richtárik,et al. Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[13] Stephen J. Wright,et al. Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..

[14] Yin Tat Lee,et al. Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[15] K. Lange,et al. Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[16] Rong Jin,et al. Mixed Optimization for Smooth Functions , 2013, NIPS.

[17] Peter Richtárik,et al. Coordinate descent with arbitrary sampling II: expected separable overapproximation , 2014, Optim. Methods Softw..

[18] Yuchen Zhang,et al. Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[19] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[20] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[21] Sham M. Kakade,et al. Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization , 2015, ICML.

[22] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[23] Jakub Konecný,et al. S2CD: Semi-stochastic coordinate descent , 2014 .

[24] Yiming Wang,et al. Accelerated Mini-batch Randomized Block Coordinate Descent Method , 2014, NIPS.

[25] Peter Richtárik,et al. Coordinate descent with arbitrary sampling I: algorithms and complexity† , 2014, Optim. Methods Softw..

[26] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[27] R. Tibshirani,et al. PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[28] Peter Richtárik,et al. Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.