Distributed Coordinate Descent Method for Learning with Big Data

In this paper we develop and analyze Hydra: HYbriD cooRdinAte descent method for solving loss minimization problems with big data. We initially partition the coordinates (features) and assign each partition to a different node of a cluster. At every iteration, each node picks a random subset of the coordinates from those it owns, independently from the other computers, and in parallel computes and applies updates to the selected coordinates based on a simple closed-form formula. We give bounds on the number of iterations sufficient to approximately solve the problem with high probability, and show how it depends on the data and on the partitioning. We perform numerical experiments with a LASSO instance described by a 3TB matrix.

[1]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[2]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[3]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[4]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[5]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[6]  Peter Richtárik,et al.  Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Truss Topology Design , 2011, OR.

[7]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[8]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[9]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[10]  Shai Shalev-Shwartz,et al.  Accelerated Mini-Batch Stochastic Dual Coordinate Ascent , 2013, NIPS.

[11]  Olivier Fercoq,et al.  Parallel Coordinate Descent for the Adaboost Problem , 2013, 2013 12th International Conference on Machine Learning and Applications.

[12]  Yoram Singer,et al.  Parallel Boosting with Momentum , 2013, ECML/PKDD.

[13]  Avleen Singh Bijral,et al.  Mini-Batch Primal and Dual Methods for SVMs , 2013, ICML.

[14]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[15]  Lin Xiao,et al.  On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[16]  Peter Richtárik,et al.  Separable approximations and decomposition methods for the augmented Lagrangian , 2013, Optim. Methods Softw..

[17]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[18]  Peter Richtárik,et al.  Inexact Coordinate Descent: Complexity and Preconditioning , 2013, J. Optim. Theory Appl..

[19]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[20]  Peter Richtárik,et al.  Smooth minimization of nonsmooth functions with parallel coordinate descent methods , 2013, Modeling and Optimization: Theory and Applications.