Finding a Summary for All Maximal Cliques

The number of maximal cliques could be exponentially large with respect to the number of vertices. A clique summary is a subset of all the maximal cliques and can somehow represent all the maximal cliques. Finding such a summary is deemed important in information distribution, influence estimation, cost-effective marketing, etc. The existing approach that finds a maximal clique summary suffers from long running time due to the excessive number of costly bound calculations that are used to estimate the size of to-be-found cliques during the enumeration process. Furthermore, we found that, sometimes, the bound calculation is not necessary at all. As a result, in order to provide the best study of the problem, we propose four strategies in two directions to speed up the process of finding a maximal clique summary by (1) restricting the bound calculation operation to a particular subset of all search branches and (2) making the best use of the bounds that have been previously calculated. Extensive experiments are conducted on eight real-world datasets to validate our strategies. Results demonstrate that the proposed method can reduce the number of bound calculations by 3 ∼ 5 orders of magnitude, and each run of our algorithm can be up to 2.x times faster than the state-of-the-art algorithm while still keeping the summary concise. Our method can potentially benefit other large-output enumeration based problems, such as frequent itemset mining, when a summary of results is needed.

[1]  Guoliang Li,et al.  Can we beat the prefix filtering?: an adaptive framework for similarity join and search , 2012, SIGMOD Conference.

[2]  Yilong Yin,et al.  A Maximal Clique Based Multiobjective Evolutionary Algorithm for Overlapping Community Detection , 2017, IEEE Transactions on Evolutionary Computation.

[3]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[4]  Thomas Linke,et al.  Visualizing plant metabolomic correlation networks using clique-metabolite matrices , 2001, Bioinform..

[5]  Jia Wang,et al.  Redundancy-aware maximal cliques , 2013, KDD.

[6]  Lijun Chang,et al.  Diversified top-k clique search , 2015, The VLDB Journal.

[7]  J. Moon,et al.  On cliques in graphs , 1965 .

[8]  Raul Castro Fernandez,et al.  Lazo: A Cardinality-Based Method for Coupled Estimation of Jaccard Similarity and Containment , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[9]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[10]  Jeffrey Xu Yu,et al.  Finding the maximum clique in massive graphs , 2017, Proc. VLDB Endow..

[11]  Ying Zhang,et al.  GB-KMV: An Augmented KMV Sketch for Approximate Containment Similarity Search , 2018, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[12]  Lijun Chang,et al.  Leveraging Set Relations in Exact Set Similarity Join , 2017, Proc. VLDB Endow..

[13]  Lijun Chang,et al.  Efficient Maximum Clique Computation over Large Sparse Graphs , 2019, KDD.

[14]  Jeffrey Xu Yu,et al.  Efficient similarity joins for near-duplicate detection , 2011, TODS.

[15]  Yufei Tao,et al.  Overlap Set Similarity Joins with Theoretical Guarantees , 2018, SIGMOD Conference.

[16]  Chuan Xiao,et al.  Pigeonring: A Principle for Faster Thresholded Similarity Search , 2018, Proc. VLDB Endow..

[17]  Akira Tanaka,et al.  The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..

[18]  Xuemin Lin,et al.  Generalizing the Pigeonhole Principle for Similarity Search in Hamming Space , 2019, IEEE Transactions on Knowledge and Data Engineering.

[19]  David Eppstein,et al.  Listing All Maximal Cliques in Sparse Graphs in Near-optimal Time , 2010, Exact Complexity of NP-hard Problems.

[20]  Kevin A. Naudé Refined pivot selection for maximal clique enumeration in graphs , 2016, Theor. Comput. Sci..

[21]  Zohar Yakhini,et al.  Similarities and differences of gene expression in yeast stress conditions , 2007, Bioinform..

[22]  Yun Yang,et al.  Mining Maximal Clique Summary with Effective Sampling , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[23]  Pablo San Segundo,et al.  Efficiently Enumerating all Maximal Cliques with Bit-Parallelism , 2017, Comput. Oper. Res..

[24]  Cyrus Rashtchian,et al.  LSF-Join: Locality Sensitive Filtering for Distributed All-Pairs Set Similarity Under Skew , 2020, WWW.

[25]  Patric R. J. Östergård,et al.  A fast algorithm for the maximum clique problem , 2002, Discret. Appl. Math..

[26]  Aristides Gionis,et al.  Approximating a collection of frequent sets , 2004, KDD.

[27]  Frédéric Cazals,et al.  A note on the problem of reporting maximal cliques , 2008, Theor. Comput. Sci..