论文信息 - Cyclades: Conflict-free Asynchronous Machine Learning - 字舞流文

Cyclades: Conflict-free Asynchronous Machine Learning

We present CYCLADES, a general framework for parallelizing stochastic optimization algorithms in a shared memory setting. CYCLADES is asynchronous during shared model updates, and requires no memory locking mechanisms, similar to HOGWILD!-type algorithms. Unlike HOGWILD!, CYCLADES introduces no conflicts during the parallel execution, and offers a black-box analysis for provable speedups across a large family of algorithms. Due to its inherent conflict-free nature and cache locality, our multi-core implementation of CYCLADES consistently outperforms HOGWILD!-type algorithms on sufficiently sparse datasets, leading to up to 40% speedup gains compared to the HOGWILD! implementation of SGD, and up to 5x gains over asynchronous implementations of variance reduction algorithms.

Dimitris S. Papailiopoulos | Christopher Ré | Kannan Ramchandran | Michael I. Jordan | Benjamin Recht | Ce Zhang | Stephen Tu | Xinghao Pan | Maximilian Lam | Stephen Tu | B. Recht | C. Ré | Dimitris Papailiopoulos | Xinghao Pan | Maximilian Lam | Ce Zhang | K. Ramchandran

[1] John N. Tsitsiklis,et al. Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[2] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[3] M. Charikar,et al. Aggregating inconsistent information: ranking and clustering , 2005, STOC '05.

[4] Christos Faloutsos,et al. PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[5] John Langford,et al. Slow Learners are Fast , 2009, NIPS.

[6] Lawrence K. Saul,et al. Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[7] Joseph M. Hellerstein,et al. GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[8] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.

[9] Joseph K. Bradley,et al. Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[10] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[11] Peter J. Haas,et al. Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[12] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[13] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.

[14] Joel A. Tropp,et al. Factoring nonnegative matrices with linear programs , 2012, NIPS.

[15] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[16] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[17] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[18] Chih-Jen Lin,et al. A fast parallel SGD for matrix factorization in shared memory systems , 2013, RecSys.

[19] Michael I. Jordan,et al. Estimation, Optimization, and Parallelism when Data is Sparse , 2013, NIPS.

[20] Michael I. Jordan,et al. Optimistic Concurrency Control for Distributed Unsupervised Learning , 2013, NIPS.

[21] Christopher Ré,et al. Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Math. Program. Comput..

[22] Christopher Ré,et al. DimmWitted: A Study of Main-Memory Statistical Analytics , 2014, Proc. VLDB Endow..

[23] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[24] Thomas Hofmann,et al. Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.

[25] Haim Avron,et al. Revisiting Asynchronous Linear Solvers: Provable Convergence Rate through Randomization , 2013, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[26] Joseph K. Bradley,et al. Parallel Double Greedy Submodular Maximization , 2014, NIPS.

[27] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[28] Stephen J. Wright,et al. An Asynchronous Parallel Randomized Kaczmarz Algorithm , 2014, ArXiv.

[29] Eric P. Xing,et al. Asynchronous Parallel Block-Coordinate Frank-Wolfe , 2014 .

[30] Inderjit S. Dhillon,et al. NOMAD: Nonlocking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion , 2013, Proc. VLDB Endow..

[31] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.

[32] Hamid Reza Feyzmahdavian,et al. An asynchronous mini-batch algorithm for regularized stochastic optimization , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[33] Stephen J. Wright,et al. An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[34] Sanjeev Arora,et al. RAND-WALK: A Latent Variable Model Approach to Word Embeddings , 2015 .

[35] Sham M. Kakade,et al. Robust Shift-and-Invert Preconditioning: Faster and More Sample Efficient Algorithms for Eigenvector Computation , 2015, ArXiv.

[36] Dimitris S. Papailiopoulos,et al. Parallel Correlation Clustering on Big Graphs , 2015, NIPS.

[37] Stephen J. Wright,et al. Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..

[38] Yijun Huang,et al. Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.

[39] Kunle Olukotun,et al. Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms , 2015, NIPS.

[40] Inderjit S. Dhillon,et al. PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent , 2015, ICML.

[41] Alexander J. Smola,et al. On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants , 2015, NIPS.

[42] Michael Krivelevich,et al. The Phase Transition in Site Percolation on Pseudo-Random Graphs , 2014, Electron. J. Comb..

[43] Ming Yan,et al. ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates , 2015, SIAM J. Sci. Comput..

[44] Eric P. Xing,et al. Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms , 2014, ICML.

[45] Sanjeev Arora,et al. A Latent Variable Model Approach to PMI-based Word Embeddings , 2015, TACL.

[46] Peter Richtárik,et al. Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[47] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[48] Dimitris S. Papailiopoulos,et al. Perturbed Iterate Analysis for Asynchronous Stochastic Optimization , 2015, SIAM J. Optim..

[49] Mingyi Hong,et al. A Distributed, Asynchronous, and Incremental Algorithm for Nonconvex Optimization: An ADMM Approach , 2014, IEEE Transactions on Control of Network Systems.