Listing k-cliques in Sparse Real-World Graphs*

Motivated by recent studies in the data mining community which require to efficiently list all k-cliques, we revisit the iconic algorithm of Chiba and Nishizeki and develop the most efficient parallel algorithm for such a problem. Our theoretical analysis provides the best asymptotic upper bound on the running time of our algorithm for the case when the input graph is sparse. Our experimental evaluation on large real-world graphs shows that our parallel algorithm is faster than state-of-the-art algorithms, while boasting an excellent degree of parallelism. In particular, we are able to list all k-cliques (for any k) in graphs containing up to tens of millions of edges as well as all $10$-cliques in graphs containing billions of edges, within a few minutes and a few hours respectively. Finally, we show how our algorithm can be employed as an effective subroutine for finding the k-clique core decomposition and an approximate k-clique densest subgraphs in very large real-world graphs.

[1]  Gregory Buehrer,et al.  A scalable pattern mining approach to web graph compression with communities , 2008, WSDM '08.

[2]  Seshadhri Comandur,et al.  A Fast and Provable Method for Estimating Clique Counts Using Turán's Theorem , 2016, WWW.

[3]  Charalampos E. Tsourakakis The K-clique Densest Subgraph Problem , 2015, WWW.

[4]  Ümit V. Çatalyürek,et al.  Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions , 2014, WWW.

[5]  James Cheng,et al.  Triangle listing in massive networks and its applications , 2011, KDD.

[6]  Lev Muchnik,et al.  Identifying influential spreaders in complex networks , 2010, 1001.5285.

[7]  Irene Finocchi,et al.  Clique Counting in MapReduce , 2014, ACM J. Exp. Algorithmics.

[8]  Ali Pinar,et al.  ESCAPE: Efficiently Counting All 5-Vertex Subgraphs , 2016, WWW.

[9]  P. Berman,et al.  On Some Tighter Inapproximability Results , 1998, Electron. Colloquium Comput. Complex..

[10]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[11]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[12]  Ryan A. Rossi,et al.  Efficient Graphlet Counting for Large Networks , 2015, 2015 IEEE International Conference on Data Mining.

[13]  Noga Alon,et al.  Color-coding , 1995, JACM.

[14]  James Cheng,et al.  Fast algorithms for maximal clique enumeration with limited memory , 2012, KDD.

[15]  Serafim Batzoglou,et al.  MotifCut: regulatory motifs finding with maximum density subgraphs , 2006, ISMB.

[16]  Divesh Srivastava,et al.  Dense subgraph maintenance under streaming edge weight updates for real-time story identification , 2012, The VLDB Journal.

[17]  Takeaki Uno Implementation issues of clique enumeration algorithm (Special issue : Theoretical computer science and discrete mathematics) , 2012 .

[18]  Takao Nishizeki,et al.  Edge-Coloring and f-Coloring for Various Classes of Graphs , 1994, J. Graph Algorithms Appl..

[19]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[20]  Enrico Gregori,et al.  Parallel $(k)$-Clique Community Detection on Large-Scale Networks , 2013, IEEE Transactions on Parallel and Distributed Systems.

[21]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[22]  Jakub W. Pachocki,et al.  Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling , 2015, KDD.

[23]  Nagiza F. Samatova,et al.  A scalable, parallel algorithm for maximal clique enumeration , 2009, J. Parallel Distributed Comput..

[24]  Leland L. Beck,et al.  Smallest-last ordering and clustering and graph coloring algorithms , 1983, JACM.

[25]  W. Art Chaovalitwongse,et al.  Adaptive epileptic seizure prediction system , 2003, IEEE Transactions on Biomedical Engineering.

[26]  Uri Zwick,et al.  Listing Triangles , 2014, ICALP.

[27]  Norishige Chiba,et al.  Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[28]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[29]  Roberto De Virgilio,et al.  Finding All Maximal Cliques in Very Large Social Networks , 2016, EDBT.

[30]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[31]  Maximilien Danisch,et al.  Finding Heaviest k-Subgraphs and Events in Social Media , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[32]  Angela P. Presson,et al.  Integrated Weighted Gene Co-expression Network Analysis with an Application to Chronic Fatigue Syndrome , 2008, BMC Systems Biology.

[33]  Yang Xiang,et al.  3-HOP: a high-compression indexing scheme for reachability query , 2009, SIGMOD Conference.

[34]  Ravi Kumar,et al.  Counting Graphlets: Space vs Time , 2017, WSDM.

[35]  Akira Tanaka,et al.  The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..

[36]  Jure Leskovec,et al.  Higher-order organization of complex networks , 2016, Science.

[37]  Marek Karpinski,et al.  On Some Tighter Inapproximability Results (Extended Abstract) , 1999, ICALP.

[38]  Tamara G. Kolda,et al.  Counting Triangles in Massive Graphs with MapReduce , 2013, SIAM J. Sci. Comput..

[39]  David Eppstein,et al.  Listing All Maximal Cliques in Large Sparse Real-World Graphs , 2011, JEAL.

[40]  Liang Ding,et al.  Migration motif: a spatial - temporal pattern mining approach for financial markets , 2009, KDD.

[41]  Charu C. Aggarwal,et al.  A Survey of Algorithms for Dense Subgraph Discovery , 2010, Managing and Mining Graph Data.

[42]  T.-H. Hubert Chan,et al.  Large Scale Density-friendly Graph Decomposition via Convex Programming , 2017, WWW.

[43]  Kazuhisa Makino,et al.  New Algorithms for Enumerating All Maximal Cliques , 2004, SWAT.

[44]  David Eppstein,et al.  Listing All Maximal Cliques in Sparse Graphs in Near-optimal Time , 2010, Exact Complexity of NP-hard Problems.

[45]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[46]  Marco Pellegrini,et al.  Extraction and classification of dense implicit communities in the Web graph , 2009, TWEB.

[47]  Silvio Lattanzi,et al.  Efficient Densest Subgraph Computation in Evolving Graphs , 2015, WWW.

[48]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[49]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[50]  Christos Faloutsos,et al.  CoreScope: Graph Mining Using k-Core Analysis — Patterns, Anomalies and Algorithms , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[51]  Francesco Bonchi,et al.  Finding Subgraphs with Maximum Total Density and Limited Overlap , 2015, WSDM.

[52]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.