论文信息 - CoCoA: A General Framework for Communication-Efficient Distributed Optimization

CoCoA: A General Framework for Communication-Efficient Distributed Optimization

The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning. We present a general-purpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing. We extend the framework to cover general non-strongly-convex regularizers, including L1-regularized problems like lasso, sparse logistic regression, and elastic net regularization, and show how earlier work can be derived as a special case. We provide convergence guarantees for the class of convex regularized loss minimization objectives, leveraging a novel approach in handling non-strongly-convex regularizers and non-smooth loss functions. The resulting framework has markedly improved performance over state-of-the-art methods, as we illustrate with an extensive set of experiments on real distributed datasets.

[1] 丸山徹. Convex Analysisの二,三の進展について , 1977 .

[2] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[3] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..

[4] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[5] Jianfeng Gao,et al. Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[6] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[7] Gideon S. Mann,et al. Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models , 2009, NIPS.

[8] Ambuj Tewari,et al. Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[9] Trevor Hastie,et al. Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[10] S. V. N. Vishwanathan,et al. A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning , 2008, J. Mach. Learn. Res..

[11] Chih-Jen Lin,et al. A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[12] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.

[13] Joseph K. Bradley,et al. Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[14] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[15] Dmitry Pechyony,et al. Solving Large Scale Linear SVM with Distributed Block Minimization , 2011 .

[16] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[17] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[18] Martin J. Wainwright,et al. Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[19] Chia-Hua Ho,et al. An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..

[20] Maria-Florina Balcan,et al. Distributed Learning, Communication Complexity and Privacy , 2012, COLT.

[21] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[22] Chih-Jen Lin,et al. Large Linear Classification When Data Cannot Fit in Memory , 2011, TKDD.

[23] Kang G. Shin,et al. Efficient Distributed Linear Classification Algorithms via the Alternating Direction Method of Multipliers , 2012, AISTATS.

[24] Rong Jin,et al. On Theoretical Analysis of Distributed Stochastic Dual Coordinate Ascent , 2013, ArXiv.

[25] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[26] João M. F. Xavier,et al. D-ADMM: A Communication-Efficient Distributed Algorithm for Separable Optimization , 2012, IEEE Transactions on Signal Processing.

[27] Shai Shalev-Shwartz,et al. Accelerated Mini-Batch Stochastic Dual Coordinate Ascent , 2013, NIPS.

[28] Rong Jin,et al. Analysis of Distributed Stochastic Dual Coordinate Ascent , 2013, 1312.1031.

[29] Michael I. Jordan,et al. Estimation, Optimization, and Parallelism when Data is Sparse , 2013, NIPS.

[30] Tianbao Yang,et al. Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent , 2013, NIPS.

[31] An Bian,et al. Parallel Coordinate Descent Newton Method for Efficient $\ell_1$-Regularized Minimization , 2013 .

[32] Avleen Singh Bijral,et al. Mini-Batch Primal and Dual Methods for SVMs , 2013, ICML.

[33] I. Necoara,et al. Distributed dual gradient methods and error bound conditions , 2014, 1401.4398.

[34] Thomas Hofmann,et al. Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.

[35] Chih-Jen Lin,et al. Iteration complexity of feasible descent methods for convex optimization , 2014, J. Mach. Learn. Res..

[36] Ohad Shamir,et al. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.

[37] Ohad Shamir,et al. Distributed stochastic optimization and learning , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[38] Brian McWilliams,et al. LOCO: Distributing Ridge Regression with Random Projections , 2014, 1406.3469.

[39] Lin Xiao,et al. On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[40] Shou-De Lin,et al. A Dual Augmented Block Minimization Framework for Learning with Limited Memory , 2015, NIPS.

[41] Tyler B. Johnson,et al. Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization , 2015, ICML.

[42] Stephen J. Wright. Coordinate descent algorithms , 2015, Mathematical Programming.

[43] Michael I. Jordan,et al. L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework , 2015, ArXiv.

[44] I. Necoara. Linear convergence of first order methods under weak nondegeneracy assumptions for convex programming , 2015 .

[45] Peter Richtárik,et al. Distributed Block Coordinate Descent for Minimizing Partially Separable Functions , 2014, 1406.0238.

[46] Ilya Trofimov,et al. Distributed Coordinate Descent for L1-regularized Logistic Regression , 2015, AIST.

[47] Yuchen Zhang,et al. Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[48] Peter Richtárik,et al. Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[49] Dan Roth,et al. Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM , 2015, ICML.