Parallel Graph Algorithms in Constant Adaptive Rounds: Theory meets Practice

We study fundamental graph problems such as graph connectivity, minimum spanning forest (MSF), and approximate maximum (weight) matching in a distributed setting. In particular, we focus on the Adaptive Massively Parallel Computation (AMPC) model, which is a theoretical model that captures MapReduce-like computation augmented with a distributed hash table. We show the first AMPC algorithms for all of the studied problems that run in a constant number of rounds and use only $O(n^\epsilon)$ space per machine, where $0 < \epsilon < 1$. Our results improve both upon the previous results in the AMPC model, as well as the best-known results in the MPC model, which is the theoretical model underpinning many popular distributed computation frameworks, such as MapReduce, Hadoop, Beam, Pregel and Giraph. Finally, we provide an empirical comparison of the algorithms in the MPC and AMPC models in a fault-tolerant distriubted computation environment. We empirically evaluate our algorithms on a set of large real-world graphs and show that our AMPC algorithms can achieve improvements in both running time and round-complexity over optimized MPC baselines.

[1]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[2]  Vahab S. Mirrokni,et al.  Connected Components at Scale via Local Contractions , 2018, ArXiv.

[3]  Manuela Fischer,et al.  Tight Analysis of Parallel Randomized Greedy MIS , 2017, SODA.

[4]  Dan Suciu,et al.  Algorithmic Aspects of Parallel Data Processing , 2018, Found. Trends Databases.

[5]  Mikhail J. Atallah,et al.  Algorithms and Theory of Computation Handbook , 2009, Chapman & Hall/CRC Applied Algorithms and Data Structures series.

[6]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[7]  Sergei Vassilvitskii,et al.  Shuffles and Circuits: (On Lower Bounds for Modern Parallel Computation) , 2016, SPAA.

[8]  Sebastiano Vigna,et al.  The Graph Structure in the Web - Analyzed on Different Aggregation Levels , 2015, J. Web Sci..

[9]  Reza Bosagh Zadeh,et al.  A Uniqueness Theorem for Clustering , 2009, UAI.

[10]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[11]  Krzysztof Onak,et al.  Round compression for parallel matching algorithms , 2017, STOC.

[12]  Kun-Lung Wu,et al.  Streaming Algorithms for k-core Decomposition , 2013, Proc. VLDB Endow..

[13]  Mohsen Ghaffari,et al.  Sparsifying Distributed Algorithms with Ramifications in Massively Parallel Computation and Centralized Local Computation , 2018, SODA.

[14]  Vahab S. Mirrokni,et al.  Massively Parallel Computation via Remote Memory Access , 2019, SPAA.

[15]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[16]  Kun-Lung Wu,et al.  Incremental k-core decomposition: algorithms and evaluation , 2016, The VLDB Journal.

[17]  Mohammad Taghi Hajiaghayi,et al.  Brief Announcement: Semi-MapReduce Meets Congested Clique , 2018, ArXiv.

[18]  Amine Mhedhbi,et al.  The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing , 2017 .

[19]  Mor Harchol-Balter,et al.  Borg: the next generation , 2020, EuroSys.

[20]  Mohammad Taghi Hajiaghayi,et al.  Exponentially Faster Massively Parallel Maximal Matching , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[21]  Silvio Lattanzi,et al.  Connected Components in MapReduce and Beyond , 2014, SoCC.

[22]  Guy E. Blelloch,et al.  Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable , 2018, SPAA.

[23]  Yuichi Yoshida,et al.  An improved constant-time approximation algorithm for maximum~matchings , 2009, STOC '09.

[24]  Jeffrey Xu Yu,et al.  I/O Efficient Core Graph Decomposition: Application to Degeneracy Ordering , 2019, IEEE Transactions on Knowledge and Data Engineering.

[25]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[26]  Qin Zhang,et al.  Sorting, Searching, and Simulation in the MapReduce Framework , 2011, ISAAC.

[27]  Vahab S. Mirrokni,et al.  Near-Optimal Massively Parallel Graph Connectivity , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[28]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[29]  Alessandro Panconesi,et al.  Concentration of Measure for the Analysis of Randomized Algorithms , 2009 .

[30]  Cecilia R. Aragon,et al.  Randomized search trees , 1989, 30th Annual Symposium on Foundations of Computer Science.

[31]  Vahab S. Mirrokni,et al.  Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs , 2017, SODA.

[32]  Michael Kaminsky,et al.  Datacenter RPCs can be General and Fast , 2018, NSDI.

[33]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[34]  Alexandr Andoni,et al.  Parallel algorithms for geometric graph problems , 2013, STOC.

[35]  Ali Pinar,et al.  Local Algorithms for Hierarchical Dense Subgraph Discovery , 2017, Proc. VLDB Endow..

[36]  Alexandr Andoni,et al.  Parallel Graph Connectivity in Log Diameter Rounds , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[37]  Philip N. Klein,et al.  A randomized linear-time algorithm to find minimum spanning trees , 1995, JACM.

[38]  Guy E. Blelloch,et al.  Greedy sequential maximal independent set and matching are parallel on average , 2012, SPAA '12.

[39]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[40]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[41]  Ashish Goel,et al.  Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..

[42]  Silvio Lattanzi,et al.  Filtering: a method for solving graph problems in MapReduce , 2011, SPAA '11.

[43]  Craig Chambers,et al.  FlumeJava: easy, efficient data-parallel pipelines , 2010, PLDI '10.

[44]  Yuanyuan Tian,et al.  Big Graph Analytics Platforms , 2017, Found. Trends Databases.

[45]  David F. Gleich,et al.  PageRank beyond the Web , 2014, SIAM Rev..

[46]  Tim Weninger,et al.  Thinking Like a Vertex , 2015, ACM Comput. Surv..

[47]  Sergei Vassilvitskii,et al.  A model of computation for MapReduce , 2010, SODA '10.

[48]  Jeffrey Xu Yu,et al.  Unboundedness and Efficiency of Truss Maintenance in Evolving Graphs , 2019, SIGMOD Conference.

[49]  Moses Charikar,et al.  Unconditional Lower Bounds for Adaptive Massively Parallel Computation , 2020, SPAA.

[50]  Fabian Kuhn,et al.  Conditional Hardness Results for Massively Parallel Computation from Distributed Lower Bounds , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[51]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[52]  Reynold Xin,et al.  Apache Spark , 2016 .

[53]  Soheil Behnezhad,et al.  Brief Announcement: Graph Matching in Massive Datasets , 2017, SPAA.

[54]  Joseph E. Gonzalez,et al.  GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[55]  Vahab Mirrokni,et al.  Distributed Weighted Matching via Randomized Composable Coresets , 2019, ICML.

[56]  Jian Li,et al.  NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization , 2019, WWW.

[57]  Ronitt Rubinfeld,et al.  Improved Massively Parallel Computation Algorithms for MIS, Matching, and Vertex Cover , 2018, PODC.