An Adaptive Parallel Algorithm for Computing Connectivity

We present an efficient distributed memory parallel algorithm for computing connected components in undirected graphs based on Shiloach-Vishkin’s PRAM approach. We discuss multiple optimization techniques that reduce communication volume as well as balance the load to improve the performance of the algorithm in practice. We also note that the efficiency of parallel graph connectivity algorithm depends on the underlying graph topology. Particularly for short diameter graph components, we observe that parallel breadth first search method offers better performance. However, running parallel BFS is not efficient for computing large diameter components or large number of small components. To address this challenge, we employ a heuristic that allows the algorithm to quickly predict the type of the network by computing the degree distribution and follow the optimal hybrid route. Using large graphs with diverse topologies from domains including metagenomics, web crawl, social graph and road networks, we show that our hybrid implementation is efficient and scalable for each of the graph types. Our approach achieves a runtime of 215 seconds using 32K cores of Cray XC30 for a metagenomic graph with over 50 billion edges. When compared against the previous state-of-the-art method, we see performance improvements up to 24x.

[1]  Dilip V. Sarwate,et al.  Computing connected components on parallel computers , 1979, CACM.

[2]  Sivasankaran Rajamanickam,et al.  A Case Study of Complex Graph Analysis in Distributed Memory: Implementation and Optimization , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[3]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[4]  John R. Gilbert,et al.  The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[5]  Guy E. Blelloch,et al.  A simple and practical linear-work parallel algorithm for connectivity , 2014, SPAA.

[6]  David A. Bader,et al.  Scalable and High Performance Betweenness Centrality on the GPU , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Koji Ueno,et al.  Highly scalable graph search for the Graph500 benchmark , 2012, HPDC '12.

[8]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[9]  Pavel A Pevzner,et al.  How to apply de Bruijn graphs to genome assembly. , 2011, Nature biotechnology.

[10]  Kamesh Madduri,et al.  Parallel breadth-first search on distributed memory systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[11]  Steve Goddard,et al.  Connected components algorithms for mesh-connected parallel computers , 1994, Parallel Algorithms.

[12]  Sivasankaran Rajamanickam,et al.  BFS and Coloring-Based Parallel Algorithms for Strongly Connected Components and Related Problems , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[13]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[14]  Jing Zhang,et al.  The real cost of sequencing: scaling computation to keep pace with data generation , 2016, Genome biology.

[15]  David A. Patterson,et al.  Distributed Memory Breadth-First Search Revisited: Enabling Bottom-Up Search , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[16]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[17]  A. Barabasi,et al.  Scale-free characteristics of random networks: the topology of the world-wide web , 2000 .

[18]  Bart Preneel,et al.  Hash functions , 2005, Encyclopedia of Cryptography and Security.

[19]  David E. Culler,et al.  Connected components on distributed memory machines , 1994, Parallel Algorithms.

[20]  Laxmikant V. Kalé,et al.  Highly scalable parallel sorting , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[21]  Jean R. S. Blair,et al.  Experiments on Union-Find Algorithms for the Disjoint-Set Data Structure , 2010, SEA.

[22]  Luis M Rodriguez-R,et al.  Estimating coverage in metagenomic data sets and why it matters , 2014, The ISME Journal.

[23]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[24]  Pavel Tvrdík,et al.  A Parallel Algorithm for Connected Components on Distributed Memory Machines , 2001, PVM/MPI.

[25]  Guojing Cong,et al.  Fast Parallel Connected Components Algorithms on GPUs , 2014, Euro-Par Workshops.

[26]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[27]  Edsger W. Dijkstra,et al.  A Discipline of Programming , 1976 .

[28]  Julian Shun,et al.  An Evaluation of Parallel Eccentricity Estimation Algorithms on Undirected Real-World Graphs , 2015, KDD.

[29]  Brian W. Barrett,et al.  Introducing the Graph 500 , 2010 .

[30]  Guy E. Blelloch,et al.  A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.

[31]  Alexander S. Szalay,et al.  FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs , 2014, FAST.

[32]  S. Tringe,et al.  Tackling soil diversity with the assembly of large, complex metagenomes , 2014, Proceedings of the National Academy of Sciences.

[33]  David A. Bader,et al.  A fast, parallel spanning tree algorithm for symmetric multiprocessors , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[34]  P. Flick,et al.  Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems , 2016, BCB.

[35]  Douglas P. Gregor,et al.  The Parallel BGL : A Generic Library for Distributed Graph Computations , 2005 .

[36]  Uzi Vishkin,et al.  An O(log n) Parallel Connectivity Algorithm , 1982, J. Algorithms.

[37]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[38]  Md. Mostofa Ali Patwary,et al.  Multi-core Spanning Forest Algorithms using the Disjoint-set Data Structure , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[39]  Tony Pan,et al.  A parallel connectivity algorithm for de Bruijn graphs in metagenomic applications , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.