论文信息 - An asynchronous parallel stochastic coordinate descent algorithm - 字舞流文

An asynchronous parallel stochastic coordinate descent algorithm

We describe an asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions. The method achieves a linear convergence rate on functions that satisfy an essential strong convexity property and a sublinear rate (1/K) on general convex functions. Near-linear speedup on a multicore system can be expected if the number of processors is O(n1/2) in unconstrained optimization and O(n1/4) in the separable-constrained case, where n is the number of variables. We describe results from implementation on 40-core processors.

Stephen J. Wright | Christopher Ré | Ji Liu | Victor Bittorf | Srikrishna Sridhar | C. Ré | Ji Liu | S. Sridhar | Victor Bittorf | Srikrishna Sridhar

[1] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[2] P. Tseng,et al. On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[3] Michael C. Ferris,et al. Parallel Variable Distribution , 1994, SIAM J. Optim..

[4] O. Mangasarian. Parallel Gradient Distribution in Unconstrained Optimization , 1995 .

[5] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[6] P. Tseng. Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[7] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[8] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[9] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[10] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[11] Paul Tseng,et al. A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[12] Paul Tseng,et al. A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training , 2010, Comput. Optim. Appl..

[13] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[14] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[15] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[16] Ohad Shamir,et al. Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.

[17] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[18] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[19] Martin J. Wainwright,et al. Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[20] Ambuj Tewari,et al. Feature Clustering for Accelerating Parallel Coordinate Descent , 2012, NIPS.

[21] Stephen J. Wright. Accelerated Block-coordinate Relaxation for Regularized Optimization , 2012, SIAM J. Optim..

[22] Shiqian Ma,et al. Fast Multiple-Splitting Algorithms for Convex Optimization , 2009, SIAM J. Optim..

[23] Amir Beck,et al. On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[24] Haim Avron,et al. A Randomized Asynchronous Linear Solver with Provable Convergence Rate , 2013, ArXiv.

[25] Ohad Shamir,et al. Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.

[26] Shai Shalev-Shwartz,et al. Accelerated Mini-Batch Stochastic Dual Coordinate Ascent , 2013, NIPS.

[27] Ming Yan,et al. Parallel and distributed sparse optimization , 2013, 2013 Asilomar Conference on Signals, Systems and Computers.

[28] Tianbao Yang,et al. Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent , 2013, NIPS.

[29] Christopher Ré,et al. DimmWitted: A Study of Main-Memory Statistical Analytics , 2014, Proc. VLDB Endow..

[30] Chih-Jen Lin,et al. Iteration complexity of feasible descent methods for convex optimization , 2014, J. Mach. Learn. Res..

[31] Haim Avron,et al. Revisiting Asynchronous Linear Solvers: Provable Convergence Rate through Randomization , 2013, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[32] Peter Richtárik,et al. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[33] Lin Xiao,et al. On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[34] Stephen J. Wright,et al. Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..

[35] Peter Richtárik,et al. Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.