论文信息 - Accelerated, Parallel, and Proximal Coordinate Descent - 字舞流文

Accelerated, Parallel, and Proximal Coordinate Descent

We propose a new randomized coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel, and PROXimal; this is the first time such a method is proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate $2\bar{\omega}\bar{L} R^2/(k+1)^2 $, where $k$ is the iteration counter, $\bar{\omega}$ is a data-weighted average degree of separability of the loss function, $\bar{L}$ is the average of Lipschitz constants associated with the coordinates and individual functions in the sum, and $R$ is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of accelerated coordinate descent. The fact that the method depends on the average degree of separability, and not on the maximum degree...

Peter Richtárik | Olivier Fercoq | Peter Richtárik | Olivier Fercoq

[1] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[2] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[3] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..

[4] K. Lange,et al. Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[5] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[6] Ambuj Tewari,et al. Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[7] Adrian S. Lewis,et al. Randomized Methods for Linear Constraints: Convergence Rates and Conditioning , 2008, Math. Oper. Res..

[8] Laurent El Ghaoui,et al. Safe Feature Elimination for the LASSO and Sparse Supervised Learning Problems , 2010, 1009.4219.

[9] Joseph K. Bradley,et al. Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[10] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[11] Tong Zhang,et al. Proximal Stochastic Dual Coordinate Ascent , 2012, ArXiv.

[12] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[13] Yin Tat Lee,et al. Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[14] Dragos N. Clipici,et al. Distributed coordinate descent methods for composite minimization , 2013 .

[15] Ion Necoara,et al. Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: Application to distributed MPC , 2013, 1302.3092.

[16] Avleen Singh Bijral,et al. Mini-Batch Primal and Dual Methods for SVMs , 2013, ICML.

[17] Peter Richtárik,et al. Fast distributed coordinate descent for non-strongly convex losses , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[18] Zhaosong Lu,et al. An Accelerated Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2014, 1407.1296.

[19] Peter Richtárik,et al. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[20] Lin Xiao,et al. On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[21] Stephen J. Wright,et al. An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[22] Lin Xiao,et al. An Accelerated Randomized Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2015, SIAM J. Optim..

[23] Peter Richtárik,et al. Inexact Coordinate Descent: Complexity and Preconditioning , 2013, J. Optim. Theory Appl..

[24] Peter Richtárik,et al. Optimization in High Dimensions via Accelerated, Parallel, and Proximal Coordinate Descent , 2016, SIAM Rev..

[25] Peter Richtárik,et al. Distributed Coordinate Descent Method for Learning with Big Data , 2013, J. Mach. Learn. Res..

[26] Tong Zhang,et al. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[27] Peter Richtárik,et al. On optimal probabilities in stochastic coordinate descent methods , 2013, Optim. Lett..

[28] Peter Richtárik,et al. Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.