The Power of Pivoting for Exact Clique Counting

Clique counting is a fundamental task in network analysis, and even the simplest setting of $3$-cliques (triangles) has been the center of much recent research. Getting the count of k-cliques for larger k is algorithmically challenging, due to the exponential blowup in the search space of large cliques. But a number of recent applications (especially for community detection or clustering) use larger clique counts. Moreover, one often desires local counts, the number of k-cliques per vertex/edge. Our main contribution is Pivoter, an algorithm that exactly counts the number of k-cliques, for all values of k. It is surprisingly effective in practice, and is able to get clique counts of graphs that were beyond the reach of previous work. For example, Pivoter gets all clique counts in a social network with a 100M edges within two hours on a commodity machine. Previous parallel algorithms do not terminate in days. Pivoter can also feasibly get local per-vertex and per-edge k-clique counts (for all k) for many public data sets with tens of millions of edges. To the best of our knowledge, this is the first algorithm that achieves such results. The main insight is the construction of a Succinct Clique Tree (SCT) that stores a compressed unique representation of all cliques in an input graph. It is built using a technique called pivoting, a classic approach by Bron-Kerbosch to reduce the recursion tree of backtracking algorithms for maximal cliques. Remarkably, the SCT can be built without actually enumerating all cliques, and provides a succinct data structure from which exact clique statistics (k-clique counts, local counts) can be read off efficiently.

[1]  Yuval Shavitt,et al.  Efficient Counting of Network Motifs , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems Workshops.

[2]  Jure Leskovec,et al.  The Local Closure Coefficient: A New Perspective On Network Clustering , 2019, WSDM.

[3]  Jure Leskovec,et al.  Higher-order organization of complex networks , 2016, Science.

[4]  Maximilien Danisch,et al.  Listing k-cliques in Sparse Real-World Graphs* , 2018, WWW.

[5]  David Eppstein,et al.  Listing All Maximal Cliques in Large Sparse Real-World Graphs , 2011, JEAL.

[6]  Johan Wahlström,et al.  Community Detection in Complex Networks via Clique Conductance , 2018, Scientific Reports.

[7]  David Eppstein,et al.  Listing All Maximal Cliques in Sparse Graphs in Near-optimal Time , 2010, Exact Complexity of NP-hard Problems.

[8]  Akira Tanaka,et al.  The Worst-Case Time Complexity for Generating All Maximal Cliques , 2004, COCOON.

[9]  Jure Leskovec,et al.  Higher-order clustering in networks , 2017, Physical review. E.

[10]  Virginia Vassilevska Efficient algorithms for clique problems , 2009 .

[11]  Xiangliang Zhang,et al.  MOSS-5: A Fast Method of Approximating Counts of 5-Node Graphlets in Large Graphs , 2018, IEEE Transactions on Knowledge and Data Engineering.

[12]  Noga Alon,et al.  Color-coding: a new method for finding simple paths, cycles and other small subgraphs within large graphs , 1994, STOC '94.

[13]  Ryan A. Rossi,et al.  Efficient Graphlet Counting for Large Networks , 2015, 2015 IEEE International Conference on Data Mining.

[14]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[15]  Seshadhri Comandur,et al.  A Fast and Provable Method for Estimating Clique Counts Using Turán's Theorem , 2016, WWW.

[16]  Jon M. Kleinberg,et al.  Detecting Strong Ties Using Network Motifs , 2017, WWW.

[17]  Danielle S. Bassett,et al.  Classification of weighted networks through mesoscale homological features , 2015, J. Complex Networks.

[18]  Leland L. Beck,et al.  Smallest-last ordering and clustering and graph coloring algorithms , 1983, JACM.

[19]  Tamara G. Kolda,et al.  Wedge sampling for computing clustering coefficients and triangle counts on large graphs † , 2013, Stat. Anal. Data Min..

[20]  R. Hanneman Introduction to Social Network Methods , 2001 .

[21]  Jonathan W. Berry,et al.  Tolerating the community detection resolution limit with edge weighting. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Jakub W. Pachocki,et al.  Scalable Motif-aware Graph Clustering , 2016, WWW.

[23]  Irene Finocchi,et al.  Clique Counting in MapReduce , 2014, ACM J. Exp. Algorithmics.

[24]  Ali Pinar,et al.  ESCAPE: Efficiently Counting All 5-Vertex Subgraphs , 2016, WWW.

[25]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[26]  Ali Pinar,et al.  Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts , 2014, WWW.

[27]  Srikanta Tirthapura,et al.  Scalable Subgraph Counting: The Methods Behind The Madness , 2019, WWW.

[28]  Jon M. Kleinberg,et al.  Subgraph frequencies: mapping the empirical and extremal geography of large graph collections , 2013, WWW.

[29]  Ümit V. Çatalyürek,et al.  Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions , 2014, WWW.

[30]  E. A. Akkoyunlu,et al.  The Enumeration of Maximal Cliques of Large Graphs , 1973, SIAM J. Comput..

[31]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[32]  Charalampos E. Tsourakakis The K-clique Densest Subgraph Problem , 2015, WWW.

[33]  Christian Komusiewicz,et al.  Parameterized Algorithmics for Finding Connected Motifs in Biological Networks , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[34]  Madhav V. Marathe,et al.  SAHAD: Subgraph Analysis in Massive Networks Using Hadoop , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[35]  Akira Tanaka,et al.  The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..

[36]  Norishige Chiba,et al.  Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[37]  Ge Xia,et al.  Linear FPT reductions and computational lower bounds , 2004, STOC '04.