Connected components of big graphs in fixed MapReduce rounds

In this paper a fast distributed approach based on MapReduce is introduced to find connected components of big graphs. Unlike previous approaches in which the number of rounds is dependent on graph topology (especially graph’s diameter), our method is the first approach that computes graph’s connected components in fixed rounds. Using this method, connected components of large graphs with diameter greater than 6000 can be determined with only two MapReduce rounds. In the first MapReduce round, local connected components of partial sub-graphs are independently computed without considering the other sub-graphs. In the next rounds, the local detected partial connected components will be merged to create final connected components of the original big graph. Experimentations show that this method is more efficient than the other well-known methods in term of execution time and communication cost.

[1]  Dilip V. Sarwate,et al.  Computing connected components on parallel computers , 1979, CACM.

[2]  Erhard Rahm,et al.  Iterative Computation of Connected Graph Components with MapReduce , 2014, Datenbank-Spektrum.

[3]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[4]  Uzi Vishkin,et al.  An O(log n) Parallel Connectivity Algorithm , 1982, J. Algorithms.

[5]  Noam Nisan,et al.  Fast Connected Components Algorithms for the EREW PRAM , 1999, SIAM J. Comput..

[6]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[7]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[8]  John H. Reif,et al.  An optimal parallel algorithm for integer sorting , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[9]  Thomas Seidl,et al.  CC-MR - Finding Connected Components in Huge Graphs with MapReduce , 2012, ECML/PKDD.

[10]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[11]  Ashwin Machanavajjhala,et al.  Finding connected components in map-reduce in logarithmic rounds , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[12]  John Greiner,et al.  A comparison of parallel algorithms for connected components , 1994, SPAA '94.

[13]  Donald B. Johnson,et al.  Connected Components in O (log^3/2 n) Parallel Time for the CREW PRAM , 1997, J. Comput. Syst. Sci..

[14]  Pavel Tvrdík,et al.  A Parallel Algorithm for Connected Components on Distributed Memory Machines , 2001, PVM/MPI.

[15]  Bin Wu,et al.  Cloud-based Connected Component Algorithm , 2010, 2010 International Conference on Artificial Intelligence and Computational Intelligence.

[16]  Steven J. Plimpton,et al.  MapReduce in MPI for Large-scale graph algorithms , 2011, Parallel Comput..

[17]  Cynthia A. Phillips,et al.  Maintaining connected components for infinite graph streams , 2013, BigMine '13.

[18]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[19]  Hassan Naderi,et al.  ExPregel: a new computational model for large‐scale graph processing , 2015, Concurr. Comput. Pract. Exp..

[20]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[21]  David E. Culler,et al.  Connected components on distributed memory machines , 1994, Parallel Algorithms.

[22]  Jun Ma,et al.  Efficient parallel algorithms for some graph theory problems , 2008, Journal of Computer Science and Technology.