论文信息 - DiSCO: Distributed Optimization for Self-Concordant Empirical Loss

DiSCO: Distributed Optimization for Self-Concordant Empirical Loss

We propose a new distributed algorithm for empirical risk minimization in machine learning. The algorithm is based on an inexact damped Newton method, where the inexact Newton steps are computed by a distributed preconditioned conjugate gradient method. We analyze its iteration complexity and communication efficiency for minimizing self-concordant empirical loss functions, and discuss the results for distributed ridge regression, logistic regression and binary classification with a smoothed hinge loss. In a standard setting for supervised learning, where the n data points are i.i.d. sampled and when the regularization parameter scales as 1/√n show that the proposed algorithm is communication efficient: the required round of communication does not increase with the sample size n, and only grows slowly with the number of machines.

Yuchen Zhang | Xiao Lin | Yuchen Zhang | Xiao Lin

[1] Gene H. Golub,et al. Matrix computations , 1983 .

[2] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[3] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[4] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..

[5] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[7] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[8] Sanjay Ghemawat,et al. MapReduce: simplified data processing on large clusters , 2008, CACM.

[9] Ohad Shamir,et al. Stochastic Convex Optimization , 2009, COLT.

[10] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.

[11] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[12] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[13] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[14] John Langford,et al. Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.

[15] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[16] Martin J. Wainwright,et al. Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[17] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[18] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[19] Martin J. Wainwright,et al. Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[20] Yurii Nesterov,et al. Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[21] Cheng-Hao Tsai,et al. Large-scale logistic regression and linear support vector machines using spark , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[22] Thomas Hofmann,et al. Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.

[23] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[24] Ohad Shamir,et al. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.

[25] Yuchen Zhang,et al. Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss , 2015, ArXiv.

[26] Chih-Jen Lin,et al. Distributed Newton Methods for Regularized Logistic Regression , 2015, PAKDD.

[27] Wotao Yin,et al. On the Global and Linear Convergence of the Generalized Alternating Direction Method of Multipliers , 2016, J. Sci. Comput..