论文信息 - Sparse Allreduce: Efficient Scalable Communication for Power-Law Data

Sparse Allreduce: Efficient Scalable Communication for Power-Law Data

Many large datasets exhibit power-law statistics: The web graph, social networks, text data, click through data etc. Their adjacency graphs are termed natural graphs, and are known to be difficult to partition. As a consequence most distributed algorithms on these graphs are communication intensive. Many algorithms on natural graphs involve an Allreduce: a sum or average of partitioned data which is then shared back to the cluster nodes. Examples include PageRank, spectral partitioning, and many machine learning algorithms including regression, factor (topic) models, and clustering. In this paper we describe an efficient and scalable Allreduce primitive for power-law data. We point out scaling problems with existing butterfly and round-robin networks for Sparse Allreduce, and show that a hybrid approach improves on both. Furthermore, we show that Sparse Allreduce stages should be nested instead of cascaded (as in the dense case). And that the optimum throughput Allreduce network should be a butterfly of heterogeneous degree where degree decreases with depth into the network. Finally, a simple replication scheme is introduced to deal with node failures. We present experiments showing significant improvements over existing systems such as PowerGraph and Hadoop.

John F. Canny | Huasha Zhao

[1] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[2] Joseph M. Hellerstein,et al. GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[3] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.

[4] Alexandru Iosup,et al. A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing , 2009, CloudComp.

[5] Jiawei Han,et al. gApprox: Mining Frequent Approximate Patterns from a Massive Network , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[6] Xin Yuan,et al. Bandwidth optimal all-reduce algorithms for clusters of workstations , 2009, J. Parallel Distributed Comput..

[7] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[8] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[9] Joseph E. Gonzalez,et al. GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[10] James Demmel,et al. Avoiding communication in sparse matrix computations , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[11] Jimeng Sun,et al. DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[12] Edward Walker,et al. Benchmarking Amazon EC2 for High-Performance Scientific Computing , 2008, login Usenix Mag..

[13] Ambuj K. Singh,et al. GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[14] John F. Canny,et al. Butterfly Mixing: Accelerating Incremental-Update Algorithms on Clusters , 2013, SDM.

[15] Jiawei Han,et al. gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[16] Christos Faloutsos,et al. PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[17] Charalampos E. Tsourakakis,et al. HADI : Fast Diameter Estimation and Mining in Massive Graphs with Hadoop , 2008 .

[18] Mark Hoemmen,et al. Communication-avoiding Krylov subspace methods , 2010 .

[19] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[20] Michael D. Ernst,et al. HaLoop , 2010, Proc. VLDB Endow..

[21] Alexander J. Smola,et al. An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[22] James Demmel,et al. Communication-avoiding parallel and sequential QR factorizations , 2008, ArXiv.

[23] John F. Canny,et al. Big data analytics with small footprint: squaring the cloud , 2013, KDD.