论文信息 - NOMAD: Nonlocking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion

NOMAD: Nonlocking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion

We develop an efficient parallel distributed algorithm for matrix completion, named NOMAD (Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion). NOMAD is a decentralized algorithm with non-blocking communication between processors. One of the key features of NOMAD is that the ownership of a variable is asynchronously transferred between processors in a decentralized fashion. As a consequence it is a lock-free parallel algorithm. In spite of being asynchronous, the variable updates of NOMAD are serializable, that is, there is an equivalent update ordering in a serial implementation. NOMAD outperforms synchronous algorithms which require explicit bulk synchronization after every iteration: our extensive empirical evaluation shows that not only does our algorithm perform well in distributed setting on commodity hardware, but also outperforms state-of-the-art algorithms on a HPC cluster both in multi-core and distributed memory settings.

[1] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[2] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[3] Donald J. Patterson,et al. Computer organization and design: the hardware-software interface (appendix a , 1993 .

[4] 장훈,et al. [서평]「Computer Organization and Design, The Hardware/Software Interface」 , 1997 .

[5] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6] Kunle Olukotun,et al. Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[7] H. Robbins. A Stochastic Approximation Method , 1951 .

[8] Torsten Hoefler,et al. Optimizing a conjugate gradient solver with non-blocking collective operations , 2007, Parallel Comput..

[9] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.

[10] Yehuda Koren,et al. Lessons from the Netflix prize challenge , 2007, SKDD.

[11] Dennis M. Wilkinson,et al. Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[12] Nathan Srebro,et al. SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[13] John Langford,et al. Slow Learners are Fast , 2009, NIPS.

[14] Alexander J. Smola,et al. An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[15] Joseph E. Gonzalez,et al. GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[16] Alexander J. Smola,et al. Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[17] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[18] Daniel B. Szyld,et al. Asynchronous Iterations , 2011, Encyclopedia of Parallel Computing.

[19] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[20] Inderjit S. Dhillon,et al. Fast coordinate descent methods with variable selection for non-negative matrix factorization , 2011, KDD.

[21] Peter J. Haas,et al. Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[22] Sergei Vassilvitskii,et al. Counting triangles and the curse of the last reducer , 2011, WWW.

[23] Rainer Gemulla,et al. Distributed Matrix Completion , 2012, 2012 IEEE 12th International Conference on Data Mining.

[24] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[25] Yehuda Koren,et al. The Yahoo! Music Dataset and KDD-Cup '11 , 2012, KDD Cup.

[26] Inderjit S. Dhillon,et al. Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems , 2012, 2012 IEEE 12th International Conference on Data Mining.

[27] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[28] Tim Kraska,et al. MLbase: A Distributed Machine-learning System , 2013, CIDR.

[29] Chih-Jen Lin,et al. A fast parallel SGD for matrix factorization in shared memory systems , 2013, RecSys.

[30] Christopher Ré,et al. Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Mathematical Programming Computation.

[31] John Langford,et al. A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..

[32] Peter Richtárik,et al. Distributed Coordinate Descent Method for Learning with Big Data , 2013, J. Mach. Learn. Res..