Shortcutting Label Propagation for Distributed Connected Components

Connected Components is a fundamental graph mining problem that has been studied for the PRAM, MapReduce and BSP models. We present a simple CC algorithm for BSP that does not mutate the graph, converges in O(log n) supersteps and scales to graphs of trillions of edges.

[1]  David A. Bader,et al.  On the architectural requirements for efficient execution of graph algorithms , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[2]  Lijun Chang,et al.  Computing Connected Components with linear communication cost in pregel-like systems , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[3]  Seth Pettie,et al.  A Randomized Time-Work Optimal Parallel Algorithm for Finding a Minimum Spanning Forest , 1999, RANDOM-APPROX.

[4]  Jayme Luiz Szwarcfiter,et al.  Experimental results of a coarse-grained parallel algorithm for spanning tree and connected components , 2004, 2010 International Conference on High Performance Computing & Simulation.

[5]  David A. Bader,et al.  Parallel Algorithms for Image Histogramming and Connected Components with an Experimental Study , 1996, J. Parallel Distributed Comput..

[6]  Richard Cole,et al.  Approximate Parallel Scheduling. II. Applications to Logarithmic-Time Optimal Parallel Graph Algorithms , 1991, Inf. Comput..

[7]  David A. Bader,et al.  A fast, parallel spanning tree algorithm for symmetric multiprocessors , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[8]  Sivasankaran Rajamanickam,et al.  BFS and Coloring-Based Parallel Algorithms for Strongly Connected Components and Related Problems , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[9]  Ajit Agrawal,et al.  A Parallel O(log N) Algorithm for Finding Connected Components In Planar Images , 1987, ICPP.

[10]  David E. Culler,et al.  Connected components on distributed memory machines , 1994, Parallel Algorithms.

[11]  Silvio Lattanzi,et al.  Connected Components in MapReduce and Beyond , 2014, SoCC.

[12]  Tsan-sheng Hsu,et al.  Parallel implementation of algorithms for finding connected components in graphs , 1994, Parallel Algorithms.

[13]  Noam Nisan,et al.  Fast connected components algorithms for the EREW PRAM , 1992, SPAA '92.

[14]  Guy E. Blelloch,et al.  A simple and practical linear-work parallel algorithm for connectivity , 2014, SPAA.

[15]  S. N. Maheshwari,et al.  Parallel Algorithms for the Connected Components and Minimal Spanning Tree Problems , 1982, Inf. Process. Lett..

[16]  Richard Cole,et al.  Finding minimum spanning forests in logarithmic time and linear work using random sampling , 1996, SPAA '96.

[17]  Pavel Tvrdík,et al.  A Parallel Algorithm for Connected Components on Distributed Memory Machines , 2001, PVM/MPI.

[18]  Tak Wah Lam,et al.  Finding connected components in O(log n loglog n) time on the EREW PRAM , 1993, SODA '93.

[19]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[20]  Stergios Stergiou,et al.  Distributed Negative Sampling for Word Embeddings , 2017, AAAI.

[21]  Guy E. Blelloch,et al.  Programming parallel algorithms , 1996, CACM.

[22]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[23]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[24]  Donald B. Johnson,et al.  Connected Components in O (log^3/2 n) Parallel Time for the CREW PRAM , 1997, J. Comput. Syst. Sci..

[25]  Ashwin Machanavajjhala,et al.  Finding connected components in map-reduce in logarithmic rounds , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[26]  Hillel Gazit,et al.  An optimal randomized parallel algorithm for finding connected components in a graph , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[27]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[28]  Kenneth A. Hawick,et al.  Parallel graph component labelling with GPUs and CUDA , 2010, Parallel Comput..

[29]  C. A. Philips,et al.  Parallel graph contraction , 1989, SPAA '89.

[30]  Baruch Awerbuch,et al.  New Connectivity and MSF Algorithms for Shuffle-Exchange Network and PRAM , 1987, IEEE Transactions on Computers.

[31]  Vijaya Chung,et al.  A Randomized Linear-Work EREW PRAM Algorithm to Find a Minimum Spanning Forest , 2003, Algorithmica.

[32]  Endre Szemerédi,et al.  Undirected Connectivity in O(log ^1.5 n) Space , 1992, FOCS.

[33]  Sebastiano Vigna,et al.  A large time-aware web graph , 2008, SIGF.

[34]  Uri Zwick,et al.  An Optimal Randomised Logarithmic Time Connectivity Algorithm for the EREW PRAM , 1996, J. Comput. Syst. Sci..

[35]  Václav Koubek,et al.  Parallel algorithms for connected components in a graph , 1985, FCT.

[36]  E. Szemerédi,et al.  Undirected connectivity in O(log/sup 1.5/n) space , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[37]  Steve Goddard,et al.  Connected components algorithms for mesh-connected parallel computers , 1994, Parallel Algorithms.

[38]  Xin Wang,et al.  CCF: Fast and scalable connected component computation in MapReduce , 2014, 2014 International Conference on Computing, Networking and Communications (ICNC).

[39]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[40]  Yijie Han,et al.  An efficient and fast parallel-connected component algorithm , 1990, JACM.

[41]  Kishore Kothapalli,et al.  Hybrid algorithms for list ranking and graph connected components , 2011, 2011 18th International Conference on High Performance Computing.

[42]  Uri Zwick,et al.  Optimal randomized EREW PRAM algorithms for finding spanning forests and for other basic graph connectivity problems , 1996, SODA '96.

[43]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[44]  Md. Mostofa Ali Patwary,et al.  Multi-core Spanning Forest Algorithms using the Disjoint-set Data Structure , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[45]  Robert S. Boyer,et al.  MJRTY: A Fast Majority Vote Algorithm , 1991, Automated Reasoning: Essays in Honor of Woody Bledsoe.

[46]  Uzi Vishkin,et al.  An O(n² log n) Parallel MAX-FLOW Algorithm , 1982, J. Algorithms.

[47]  Yahiko Kambayashi,et al.  A Simpler Parallel Algorithm for Graph Conectivity , 1994, J. Algorithms.

[48]  Uzi Vishkin,et al.  An O(log n) Parallel Connectivity Algorithm , 1982, J. Algorithms.

[49]  Francis Y. L. Chin,et al.  Efficient parallel algorithms for some graph problems , 1982, CACM.

[50]  Larry Rudolph,et al.  Efficient parallel algorithms for graph problems , 1990, Algorithmica.

[51]  Dilip V. Sarwate,et al.  Computing connected components on parallel computers , 1979, CACM.

[52]  Laura Ricci,et al.  Fast Connected Components Computation in Large Graphs by Vertex Pruning , 2017, IEEE Transactions on Parallel and Distributed Systems.

[53]  P. J. Narayanan,et al.  A fast GPU algorithm for graph connectivity , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[54]  Sung-Hyon Myaeng,et al.  Partition Aware Connected Component Computation in Distributed Systems , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).