论文信息 - Ju n 20 15 Coordinate Descent with Arbitrary Sampling I : Algorithms and Complexity ∗ - 字舞流文

Ju n 20 15 Coordinate Descent with Arbitrary Sampling I : Algorithms and Complexity ∗

We study the problem of minimizing the sum of a smooth convex function and a convex blockseparable regularizer and propose a new randomized coordinate descent method, which we call ALPHA. Our method at every iteration updates a random subset of coordinates, following an arbitrary distribution. No coordinate descent methods capable to handle an arbitrary sampling have been studied in the literature before for this problem. ALPHA is a remarkably flexible algorithm: in special cases, it reduces to deterministic and randomized methods such as gradient descent, coordinate descent, parallel coordinate descent and distributed coordinate descent – both in nonaccelerated and accelerated variants. The variants with arbitrary (or importance) sampling are new. We provide a complexity analysis of ALPHA, from which we deduce as a direct corollary complexity bounds for its many variants, all matching or improving best known bounds.

Peter Richtárik | Zheng Qu

[1] Peter Richtárik,et al. Smooth minimization of nonsmooth functions with parallel coordinate descent methods , 2013, Modeling and Optimization: Theory and Applications.

[2] Peter Richtárik,et al. Semi-Stochastic Gradient Descent Methods , 2013, Front. Appl. Math. Stat..

[3] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[4] Peter Richtárik,et al. Coordinate descent with arbitrary sampling II: expected separable overapproximation , 2014, Optim. Methods Softw..

[5] Peter Richtárik,et al. On optimal probabilities in stochastic coordinate descent methods , 2013, Optim. Lett..

[6] Peter Richtárik,et al. Distributed Coordinate Descent Method for Learning with Big Data , 2013, J. Mach. Learn. Res..

[7] Tong Zhang,et al. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[8] Peter Richtárik,et al. Inexact Coordinate Descent: Complexity and Preconditioning , 2013, J. Optim. Theory Appl..

[9] Peter Richtárik,et al. Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[10] Yuchen Zhang,et al. Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[11] Stephen J. Wright,et al. Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..

[12] Julien Mairal,et al. Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[13] Peter Richtárik,et al. Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[14] Stephen J. Wright,et al. An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[15] Peter Richtárik,et al. Separable approximations and decomposition methods for the augmented Lagrangian , 2013, Optim. Methods Softw..

[16] Lin Xiao,et al. On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[17] Peter Richtárik,et al. Randomized Dual Coordinate Ascent with Arbitrary Sampling , 2014, ArXiv.

[18] Zhaosong Lu,et al. An Accelerated Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2014, 1407.1296.

[19] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[20] Peter Richtárik,et al. Fast distributed coordinate descent for non-strongly convex losses , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[21] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[22] Ion Necoara,et al. A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints , 2013, Comput. Optim. Appl..

[23] Peter Richtárik,et al. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[24] Rong Jin,et al. Linear Convergence with Condition Number Independent Access of Full Gradients , 2013, NIPS.

[25] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[26] Rong Jin,et al. MixedGrad: An O(1/T) Convergence Rate Algorithm for Stochastic Smooth Optimization , 2013, ArXiv.

[27] Shai Shalev-Shwartz,et al. Accelerated Mini-Batch Stochastic Dual Coordinate Ascent , 2013, NIPS.

[28] Yin Tat Lee,et al. Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[29] Avleen Singh Bijral,et al. Mini-Batch Primal and Dual Methods for SVMs , 2013, ICML.

[30] Ion Necoara,et al. Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: Application to distributed MPC , 2013, 1302.3092.

[31] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[32] Peter Richtárik,et al. Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Truss Topology Design , 2011, OR.

[33] Ambuj Tewari,et al. Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[34] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[35] Chih-Jen Lin,et al. A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[36] Michael Elad,et al. Coordinate and subspace optimization methods for linear least squares with non-quadratic regularization , 2007 .

[37] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[38] Marc Teboulle,et al. Interior Gradient and Proximal Methods for Convex and Conic Optimization , 2006, SIAM J. Optim..

[39] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[40] Marc Teboulle,et al. Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions , 1993, SIAM J. Optim..