Massively Parallel Algorithms for Finding Well-Connected Components in Sparse Graphs

Massively parallel computation (MPC) algorithms for graph problems have witnessed a resurgence of interest in recent years. Despite major progress for numerous graph problems however, the complexity of the sparse graph connectivity problem in this model has remained elusive: While classical logarithmic-round PRAM algorithms for finding connected components in any n-vertex graph have been known for more than three decades (and imply the same bounds for MPC model), no o(log n)-round MPC algorithms are known for this task with truly sublinear in n memory per machine (which is the only interesting regime for sparse graphs with O(n) edges). It is conjectured that an o(log n)-round algorithm for connectivity on general sparse graphs with n1-Ω (1) per-machine memory may not exist, a conjecture that also forms the basis for multiple conditional hardness results on the round complexity of other problems in the MPC model. We take an opportunistic approach towards the sparse graph connectivity problem by designing an algorithm with improved performance in terms of the connectivity structure of the input graph. Formally, we design an MPC algorithm that finds all connected components with spectral gap at least λ in a graph in O(log log n + log(1/λ)) MPC rounds and nδ memory per machine for any constant δ ∈ (0,1). While this algorithm still requires Θ(log n) rounds in the worst-case, it achieves an exponential round reduction on "well-connected'' components with λ ≥ 1/polylog(n) using only nδ memory per machine and ł(n) total memory, and still operates in o(log n)l rounds even when λ = 1/no(1). En-route to our main result, we design a new distributed data structure for performing independent random walks from all vertices simultaneously, as well as a new leader-election algorithm with exponentially faster round complexity on random graphs.

[1]  Richard J. Lipton,et al.  Random walks, universal traversal sequences, and the complexity of maze problems , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[2]  Uzi Vishkin,et al.  An O(n² log n) Parallel MAX-FLOW Algorithm , 1982, J. Algorithms.

[3]  Uzi Vishkin,et al.  An O(log n) Parallel Connectivity Algorithm , 1982, J. Algorithms.

[4]  John H Reif Optimal Parallel Algorithms for Interger Sorting and Graph Connectivity. , 1985 .

[5]  Stephen A. Cook,et al.  Upper and Lower Time Bounds for Parallel Random Access Machines without Simultaneous Writes , 1986, SIAM J. Comput..

[6]  Hillel Gazit,et al.  An optimal randomized parallel algorithm for finding connected components in a graph , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[7]  Ian Parberry,et al.  Improved Upper and Lower Time Bounds for Parallel Random Access Machines Without Simultaneous Writes , 1991, SIAM J. Comput..

[8]  Noam Nisan,et al.  Fast connected components algorithms for the EREW PRAM , 1992, SPAA '92.

[9]  Noam Nisan,et al.  On the degree of boolean functions as real polynomials , 1992, STOC '92.

[10]  Uriel Feige,et al.  Short random walks on graphs , 1993, SIAM J. Discret. Math..

[11]  Uri Zwick,et al.  An optimal randomized logarithmic time connectivity algorithm for the EREW PRAM (extended abstract) , 1994, SPAA '94.

[12]  Avi Wigderson,et al.  Entropy waves, the zig-zag graph product, and new constant-degree expanders and extractors , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[13]  Ronald de Wolf,et al.  Quantum lower bounds by polynomials , 2001, JACM.

[14]  Christos Gkantsidis,et al.  Conductance and congestion in power law graphs , 2003, SIGMETRICS '03.

[15]  Boaz Patt-Shamir,et al.  MST construction in O(log log n) communication rounds , 2003, SPAA '03.

[16]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[17]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[18]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[19]  Sergei Vassilvitskii,et al.  A model of computation for MapReduce , 2010, SODA '10.

[20]  Danupon Nanongkai,et al.  A tight unconditional lower bound on distributed randomwalk computation , 2011, PODC '11.

[21]  Vasileios Megalooikonomou,et al.  Expansion Properties of Large Social Graphs , 2011, DASFAA Workshops.

[22]  Silvio Lattanzi,et al.  Filtering: a method for solving graph problems in MapReduce , 2011, SPAA '11.

[23]  Jeffrey D. Ullman,et al.  Map-reduce extensions and recursive queries , 2011, EDBT/ICDT '11.

[24]  Sreenivas Gollapudi,et al.  Estimating PageRank on graph streams , 2008, PODS.

[25]  Qin Zhang,et al.  Sorting, Searching, and Simulation in the MapReduce Framework , 2011, ISAAC.

[26]  Sudipto Guha,et al.  Analyzing graph structure via linear measurements , 2012, SODA.

[27]  Ashwin Machanavajjhala,et al.  Finding connected components in map-reduce in logarithmic rounds , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[28]  Benjamin Moseley,et al.  Fast greedy algorithms in mapreduce and streaming , 2013, SPAA.

[29]  Paraschos Koutris,et al.  Communication steps for parallel query processing , 2013, PODS '13.

[30]  Prasad Tetali,et al.  Distributed Random Walks , 2013, JACM.

[31]  Christoph Lenzen,et al.  Optimal deterministic routing and sorting on the congested clique , 2012, PODC '13.

[32]  Alexandr Andoni,et al.  Parallel algorithms for geometric graph problems , 2013, STOC.

[33]  Silvio Lattanzi,et al.  Connected Components in MapReduce and Beyond , 2014, SoCC.

[34]  Access to Data and Number of Iterations: Dual Primal Algorithms for Maximum Matching under Resource Constraints , 2013, SPAA.

[35]  Sriram V. Pemmaraju,et al.  Toward Optimal Bounds in the Congested Clique: Graph Connectivity and MST , 2015, PODC.

[36]  Merav Parter,et al.  MST in Log-Star Rounds of Congested Clique , 2016, PODC.

[37]  Sergei Vassilvitskii,et al.  Shuffles and Circuits: (On Lower Bounds for Modern Parallel Computation) , 2016, SPAA.

[38]  Sepehr Assadi,et al.  Randomized Composable Coresets for Matching and Vertex Cover , 2017, SPAA.

[39]  Sepehr Assadi Simple Round Compression for Parallel Vertex Cover , 2017, ArXiv.

[40]  Grigory Yaroslavtsev,et al.  Massively Parallel Algorithms and Hardness for Single-Linkage Clustering Under $\ell_p$-Distances , 2017, ICML.

[41]  Vahab S. Mirrokni,et al.  Connected Components at Scale via Local Contractions , 2018, ArXiv.

[42]  Richard M. Karp,et al.  Massively Parallel Symmetry Breaking on Sparse Graphs: MIS and Maximal Matching , 2018, ArXiv.

[43]  Manuela Fischer,et al.  Breaking the Linear-Memory Barrier in MPC: Fast MIS on Trees with nε Memory per Machine , 2018, ArXiv.

[44]  Sebastian Brandt,et al.  Matching and MIS for Uniformly Sparse Graphs in the Low-Memory MPC Model , 2018, ArXiv.

[45]  Ronitt Rubinfeld,et al.  Improved Massively Parallel Computation Algorithms for MIS, Matching, and Vertex Cover , 2018, PODC.

[46]  Krzysztof Onak,et al.  Round compression for parallel matching algorithms , 2017, STOC.

[47]  Alexandr Andoni,et al.  Parallel Graph Connectivity in Log Diameter Rounds , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[48]  Krzysztof Onak Round Compression for Parallel Graph Algorithms in Strongly Sublinear Space , 2018, ArXiv.

[49]  Christian Konrad,et al.  MIS in the Congested Clique Model in O(log log Δ) Rounds , 2018, ArXiv.

[50]  Tomasz Jurdzinski,et al.  MST in O(1) Rounds of Congested Clique , 2018, SODA.

[51]  Mohammad Taghi Hajiaghayi,et al.  Brief Announcement: Semi-MapReduce Meets Congested Clique , 2018, ArXiv.

[52]  Mohsen Ghaffari,et al.  Sparsifying Distributed Algorithms with Ramifications in Massively Parallel Computation and Centralized Local Computation , 2018, SODA.

[53]  Yufan Zheng,et al.  The Complexity of (Δ+1) Coloring in Congested Clique, Massively Parallel Computation, and Centralized Local Computation , 2018, PODC.

[54]  Yu Chen,et al.  Sublinear Algorithms for (∆ + 1) Vertex Coloring (cid:3) , 2018 .

[55]  Vahab S. Mirrokni,et al.  Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs , 2017, SODA.

[56]  Mohammad Taghi Hajiaghayi,et al.  Exponentially Faster Massively Parallel Maximal Matching , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).