Parallel Algorithms for Small Subgraph Counting

Subgraph counting is a fundamental problem in analyzing massive graphs, often studied in the context of social and complex networks. There is a rich literature on designing efficient, accurate, and scalable algorithms for this problem. In this work, we tackle this challenge and design several new algorithms for subgraph counting in the Massively Parallel Computation (MPC) model: Given a graph $G$ over $n$ vertices, $m$ edges and $T$ triangles, our first main result is an algorithm that, with high probability, outputs a $(1+\varepsilon)$-approximation to $T$, with optimal round and space complexity provided any $S \geq \max{(\sqrt m, n^2/m)}$ space per machine, assuming $T=\Omega(\sqrt{m/n})$. Our second main result is an $\tilde{O}_{\delta}(\log \log n)$-rounds algorithm for exactly counting the number of triangles, parametrized by the arboricity $\alpha$ of the input graph. The space per machine is $O(n^{\delta})$ for any constant $\delta$, and the total space is $O(m\alpha)$, which matches the time complexity of (combinatorial) triangle counting in the sequential model. We also prove that this result can be extended to exactly counting $k$-cliques for any constant $k$, with the same round complexity and total space $O(m\alpha^{k-2})$. Alternatively, allowing $O(\alpha^2)$ space per machine, the total space requirement reduces to $O(n\alpha^2)$. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most $5$, can be implemented in the MPC model in $\tilde{O}_{\delta}(\sqrt{\log n})$ rounds, $O(n^{\delta})$ space per machine and $O(m\alpha^3)$ total space. Therefore, this result also exhibits the phenomenon that a time bound in the sequential model translates to a space bound in the MPC model.

[1]  Chin-Wan Chung,et al.  An efficient MapReduce algorithm for counting triangles in a very large graph , 2013, CIKM.

[2]  Sepehr Assadi,et al.  A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling , 2018, ITCS.

[3]  Thomas Sauerwald,et al.  Counting Arbitrary Subgraphs in Data Streams , 2012, ICALP.

[4]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[5]  Jon M. Kleinberg,et al.  Subgraph frequencies: mapping the empirical and extremal geography of large graph collections , 2013, WWW.

[6]  Tamara G. Kolda,et al.  Counting Triangles in Massive Graphs with MapReduce , 2013, SIAM J. Sci. Comput..

[7]  Mohammad Taghi Hajiaghayi,et al.  Exponentially Faster Massively Parallel Maximal Matching , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[8]  Larry Carter,et al.  New Hash Functions and Their Use in Authentication and Set Equality , 1981, J. Comput. Syst. Sci..

[9]  S. Shen-Orr,et al.  Networks Network Motifs : Simple Building Blocks of Complex , 2002 .

[10]  Tsvi Kopelowitz,et al.  Higher Lower Bounds from the 3SUM Conjecture , 2014, SODA.

[11]  Noga Alon,et al.  Linear Time Algorithms for Finding a Dominating Set of Fixed Size in Degenerated Graphs , 2007, Algorithmica.

[12]  Christoph Lenzen,et al.  "Tri, Tri Again": Finding Triangles and Small Subgraphs in a Distributed Setting - (Extended Abstract) , 2012, DISC.

[13]  Mihail N. Kolountzakis,et al.  Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning , 2010, Internet Math..

[14]  Dan Suciu,et al.  Skew in parallel query processing , 2014, PODS.

[15]  Mohammad Taghi Hajiaghayi,et al.  Streaming and Massively Parallel Algorithms for Edge Coloring , 2019, ESA.

[16]  Alexandr Andoni,et al.  Parallel algorithms for geometric graph problems , 2013, STOC.

[17]  Dana Ron,et al.  Faster sublinear approximation of the number of k-cliques in low-arboricity graphs , 2020, SODA.

[18]  David Eppstein,et al.  Listing All Maximal Cliques in Large Sparse Real-World Graphs , 2011, JEAL.

[19]  Mohammad Taghi Hajiaghayi,et al.  Brief Announcement: Semi-MapReduce Meets Congested Clique , 2018, ArXiv.

[20]  Jens Gustedt,et al.  Bounded Arboricity to Determine the Local Structure of Sparse Graphs , 2006, WG.

[21]  Lijun Chang,et al.  Scalable Subgraph Enumeration in MapReduce , 2015, Proc. VLDB Endow..

[22]  James Cheng,et al.  Triangle listing in massive networks and its applications , 2011, KDD.

[23]  Ronitt Rubinfeld,et al.  Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling , 2017, Algorithmica.

[24]  Richard M. Karp,et al.  Massively Parallel Computation of Matching and MIS in Sparse Graphs , 2019, PODC.

[25]  Alexandr Andoni,et al.  Log Diameter Rounds Algorithms for 2-Vertex and 2-Edge Connectivity , 2019, ICALP.

[26]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[27]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[28]  Silvio Lattanzi,et al.  Improved Parallel Algorithms for Density-Based Network Clustering , 2019, ICML.

[29]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[30]  Krzysztof Onak,et al.  Walking randomly, massively, and efficiently , 2019, STOC.

[31]  Norishige Chiba,et al.  Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[32]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[33]  Sergei Vassilvitskii,et al.  A model of computation for MapReduce , 2010, SODA '10.

[34]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[35]  Mihai Patrascu,et al.  Towards polynomial lower bounds for dynamic problems , 2010, STOC '10.

[36]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[37]  Rafail Ostrovsky,et al.  How Hard Is Counting Triangles in the Streaming Model? , 2013, ICALP.

[38]  Sepehr Assadi Simple Round Compression for Parallel Vertex Cover , 2017, ArXiv.

[39]  Sriram V. Pemmaraju,et al.  Lessons from the Congested Clique applied to MapReduce , 2015, Theor. Comput. Sci..

[40]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[41]  Sung-Ryul Kim,et al.  Improved Sampling for Triangle Counting with MapReduce , 2011, ICHIT.

[42]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[43]  Julian Shun,et al.  Parallel Clique Counting and Peeling Algorithms , 2020, ACDA.

[44]  Qin Zhang,et al.  Sorting, Searching, and Simulation in the MapReduce Framework , 2011, ISAAC.

[45]  Mikkel Thorup,et al.  Faster Algorithms for Edge Connectivity via Random 2-Out Contractions , 2019, SODA.

[46]  Dorothea Wagner,et al.  Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study , 2005, WEA.

[47]  Sergei Vassilvitskii,et al.  Shuffles and Circuits: (On Lower Bounds for Modern Parallel Computation) , 2016, SPAA.

[48]  Ola Svensson,et al.  Weighted Matchings via Unweighted Augmentations , 2018, PODC.

[49]  Vachik S. Dave,et al.  E-CLoG: Counting edge-centric local graphlets , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[50]  Irene Finocchi,et al.  Clique Counting in MapReduce , 2014, ACM J. Exp. Algorithmics.

[51]  Dana Ron,et al.  Approximately Counting Triangles in Sublinear Time , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[52]  Jeffrey D. Ullman,et al.  Enumerating subgraph instances using map-reduce , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[53]  Krzysztof Onak,et al.  Round compression for parallel matching algorithms , 2017, STOC.

[54]  Tamara G. Kolda,et al.  Triadic Measures on Graphs: The Power of Wedge Sampling , 2012, SDM.

[55]  Seshadhri Comandur,et al.  Linear Time Subgraph Counting, Graph Degeneracy, and the Chasm at Size Six , 2019, ITCS.

[56]  Nicholas J. A. Harvey,et al.  Greedy and Local Ratio Algorithms in the MapReduce Model , 2018, SPAA.

[57]  Sepehr Assadi,et al.  Massively Parallel Algorithms for Finding Well-Connected Components in Sparse Graphs , 2018, PODC.

[58]  Soheil Behnezhad,et al.  Semi-MapReduce Meets Congested Clique , 2018 .

[59]  Fabian Kuhn,et al.  Conditional Hardness Results for Massively Parallel Computation from Distributed Lower Bounds , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[60]  Mohsen Ghaffari,et al.  Sparsifying Distributed Algorithms with Ramifications in Massively Parallel Computation and Centralized Local Computation , 2018, SODA.

[61]  Friedrich Eisenbrand,et al.  On the complexity of fixed parameter clique and dominating set , 2004, Theor. Comput. Sci..

[62]  Alexandr Andoni,et al.  Parallel Graph Connectivity in Log Diameter Rounds , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[63]  Virginia Vassilevska Williams,et al.  Efficient algorithms for clique problems , 2009, Inf. Process. Lett..

[64]  Madhav V. Marathe,et al.  PATRIC: a parallel algorithm for counting triangles in massive networks , 2013, CIKM.

[65]  Andreas Björklund,et al.  Counting Paths and Packings in Halves , 2009, ESA.

[66]  Aravind Srinivasan,et al.  Chernoff-Hoeffding bounds for applications with limited independence , 1995, SODA '93.

[67]  Vahab S. Mirrokni,et al.  Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs , 2017, SODA.

[68]  Silvio Lattanzi,et al.  Filtering: a method for solving graph problems in MapReduce , 2011, SPAA '11.

[69]  Mohammad Ghodsi,et al.  Approximating Edit Distance in Truly Subquadratic Time: Quantum and MapReduce , 2018, SODA.

[70]  Rasmus Pagh,et al.  MapReduce Triangle Enumeration With Guarantees , 2014, CIKM.

[71]  Seshadhri Comandur,et al.  A Fast and Provable Method for Estimating Clique Counts Using Turán's Theorem , 2016, WWW.

[72]  Yufan Zheng,et al.  The Complexity of (Δ+1) Coloring in Congested Clique, Massively Parallel Computation, and Centralized Local Computation , 2018, PODC.

[73]  Manuela Fischer,et al.  Breaking the Linear-Memory Barrier in MPC: Fast MIS on Trees with nε Memory per Machine , 2018, ArXiv.

[74]  Sofya Vorotnikova,et al.  Better Algorithms for Counting Triangles in Data Streams , 2016, PODS.

[75]  Silvio Lattanzi,et al.  Dynamic Algorithms for the Massively Parallel Computation Model , 2019, SPAA.

[76]  Benjamin Moseley,et al.  Efficient massively parallel methods for dynamic programming , 2017, STOC.

[77]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[78]  Charalampos E. Tsourakakis,et al.  Colorful triangle counting and a MapReduce implementation , 2011, Inf. Process. Lett..

[79]  Christos Faloutsos,et al.  Patterns and anomalies in k-cores of real-world graphs with applications , 2018, Knowledge and Information Systems.

[80]  Vahab S. Mirrokni,et al.  Near-Optimal Massively Parallel Graph Connectivity , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[81]  Ronitt Rubinfeld,et al.  Improved Massively Parallel Computation Algorithms for MIS, Matching, and Vertex Cover , 2018, PODC.

[82]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[83]  Amit Chakrabarti,et al.  Towards Tighter Space Bounds for Counting Triangles and Other Substructures in Graph Streams , 2017, STACS.

[84]  Christoph Lenzen,et al.  Algebraic methods in the congested clique , 2015, Distributed Computing.

[85]  E. Bloedorn,et al.  Relational Graph Analysis with Real-World Constraints : An Application in IRS Tax Fraud Detection , 2005 .

[86]  Ryan A. Rossi,et al.  Graphlet decomposition: framework, algorithms, and applications , 2015, Knowledge and Information Systems.

[87]  Christos Faloutsos,et al.  Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation , 2011, Social Network Analysis and Mining.