An Adaptive Parallel Algorithm for Computing Connected Components

We present an efficient distributed memory parallel algorithm for computing connected components in undirected graphs based on Shiloach-Vishkin’s PRAM approach. We discuss multiple optimization techniques that reduce communication volume as well as load-balance the algorithm. We also note that the efficiency of the parallel graph connectivity algorithm depends on the underlying graph topology. Particularly for short diameter graph components, we observe that parallel Breadth First Search (BFS) method offers better performance. However, running parallel BFS is not efficient for computing large diameter components or large number of small components. To address this challenge, we employ a heuristic that allows the algorithm to quickly predict the type of the network by computing the degree distribution and follow the optimal hybrid route. Using large graphs with diverse topologies from domains including metagenomics, web crawl, social graph and road networks, we show that our hybrid implementation is efficient and scalable for each of the graph types. Our approach achieves a runtime of 215 seconds using 32 K cores of Cray XC30 for a metagenomic graph with over 50 billion edges. When compared against the previous state-of-the-art method, we see performance improvements up to 24 <inline-formula><tex-math notation="LaTeX">$\times$</tex-math><alternatives> <inline-graphic xlink:href="aluru-ieq1-2672739.gif"/></alternatives></inline-formula>.

[1]  Edsger W. Dijkstra,et al.  A Discipline of Programming , 1976 .

[2]  David E. Culler,et al.  Connected components on distributed memory machines , 1994, Parallel Algorithms.

[3]  David A. Patterson,et al.  Distributed Memory Breadth-First Search Revisited: Enabling Bottom-Up Search , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[4]  Julian Shun,et al.  An Evaluation of Parallel Eccentricity Estimation Algorithms on Undirected Real-World Graphs , 2015, KDD.

[5]  Kamesh Madduri,et al.  Parallel breadth-first search on distributed memory systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[6]  Sivasankaran Rajamanickam,et al.  BFS and Coloring-Based Parallel Algorithms for Strongly Connected Components and Related Problems , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[7]  Bart Preneel,et al.  Hash functions , 2005, Encyclopedia of Cryptography and Security.

[8]  Brian W. Barrett,et al.  Introducing the Graph 500 , 2010 .

[9]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[10]  Pavel Tvrdík,et al.  A Parallel Algorithm for Connected Components on Distributed Memory Machines , 2001, PVM/MPI.

[11]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[12]  Jing Zhang,et al.  Erratum to: The real cost of sequencing: scaling computation to keep pace with data generation , 2016, Genome Biology.

[13]  Pavel A Pevzner,et al.  How to apply de Bruijn graphs to genome assembly. , 2011, Nature biotechnology.

[14]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[15]  Md. Mostofa Ali Patwary,et al.  Multi-core Spanning Forest Algorithms using the Disjoint-set Data Structure , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[16]  Dilip V. Sarwate,et al.  Computing connected components on parallel computers , 1979, CACM.

[17]  Tony Pan,et al.  A parallel connectivity algorithm for de Bruijn graphs in metagenomic applications , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Sivasankaran Rajamanickam,et al.  A Case Study of Complex Graph Analysis in Distributed Memory: Implementation and Optimization , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[19]  John R. Gilbert,et al.  The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[20]  Jean R. S. Blair,et al.  Experiments on Union-Find Algorithms for the Disjoint-Set Data Structure , 2010, SEA.

[21]  David A. Bader,et al.  Scalable and High Performance Betweenness Centrality on the GPU , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[23]  Laxmikant V. Kalé,et al.  Highly scalable parallel sorting , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[24]  P. Flick,et al.  Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems , 2016, BCB.

[25]  Jeremy Iverson,et al.  Evaluation of connected-component labeling algorithms for distributed-memory systems , 2015, Parallel Comput..

[26]  Douglas P. Gregor,et al.  The Parallel BGL : A Generic Library for Distributed Graph Computations , 2005 .

[27]  Uzi Vishkin,et al.  An O(log n) Parallel Connectivity Algorithm , 1982, J. Algorithms.

[28]  Luis M Rodriguez-R,et al.  Estimating coverage in metagenomic data sets and why it matters , 2014, The ISME Journal.

[29]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[30]  Guy E. Blelloch,et al.  A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.

[31]  Alexander S. Szalay,et al.  FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs , 2014, FAST.

[32]  Chris Arney,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Easley, D. and Kleinberg, J.; 2010) [Book Review] , 2013, IEEE Technology and Society Magazine.

[33]  S. Tringe,et al.  Tackling soil diversity with the assembly of large, complex metagenomes , 2014, Proceedings of the National Academy of Sciences.

[34]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[35]  A. Barabasi,et al.  Scale-free characteristics of random networks: the topology of the world-wide web , 2000 .

[36]  Guy E. Blelloch,et al.  A simple and practical linear-work parallel algorithm for connectivity , 2014, SPAA.

[37]  Koji Ueno,et al.  Highly scalable graph search for the Graph500 benchmark , 2012, HPDC '12.

[38]  Steve Goddard,et al.  Connected components algorithms for mesh-connected parallel computers , 1994, Parallel Algorithms.

[39]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[40]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[41]  Guojing Cong,et al.  Fast Parallel Connected Components Algorithms on GPUs , 2014, Euro-Par Workshops.

[42]  David A. Bader,et al.  A fast, parallel spanning tree algorithm for symmetric multiprocessors , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..