论文信息 - Finding a Summary for All Maximal Cliques

Finding a Summary for All Maximal Cliques

The number of maximal cliques could be exponentially large with respect to the number of vertices. A clique summary is a subset of all the maximal cliques and can somehow represent all the maximal cliques. Finding such a summary is deemed important in information distribution, influence estimation, cost-effective marketing, etc. The existing approach that finds a maximal clique summary suffers from long running time due to the excessive number of costly bound calculations that are used to estimate the size of to-be-found cliques during the enumeration process. Furthermore, we found that, sometimes, the bound calculation is not necessary at all. As a result, in order to provide the best study of the problem, we propose four strategies in two directions to speed up the process of finding a maximal clique summary by (1) restricting the bound calculation operation to a particular subset of all search branches and (2) making the best use of the bounds that have been previously calculated. Extensive experiments are conducted on eight real-world datasets to validate our strategies. Results demonstrate that the proposed method can reduce the number of bound calculations by 3 ∼ 5 orders of magnitude, and each run of our algorithm can be up to 2.x times faster than the state-of-the-art algorithm while still keeping the summary concise. Our method can potentially benefit other large-output enumeration based problems, such as frequent itemset mining, when a summary of results is needed.

[1] Guoliang Li,et al. Can we beat the prefix filtering?: an adaptive framework for similarity join and search , 2012, SIGMOD Conference.

[2] Yilong Yin,et al. A Maximal Clique Based Multiobjective Evolutionary Algorithm for Overlapping Community Detection , 2017, IEEE Transactions on Evolutionary Computation.

[3] Stephen B. Seidman,et al. Network structure and minimum degree , 1983 .

[4] Thomas Linke,et al. Visualizing plant metabolomic correlation networks using clique-metabolite matrices , 2001, Bioinform..

[5] Jia Wang,et al. Redundancy-aware maximal cliques , 2013, KDD.

[6] Lijun Chang,et al. Diversified top-k clique search , 2015, The VLDB Journal.

[7] J. Moon,et al. On cliques in graphs , 1965 .

[8] Raul Castro Fernandez,et al. Lazo: A Cardinality-Based Method for Coupled Estimation of Jaccard Similarity and Containment , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[9] C. Bron,et al. Algorithm 457: finding all cliques of an undirected graph , 1973 .

[10] Jeffrey Xu Yu,et al. Finding the maximum clique in massive graphs , 2017, Proc. VLDB Endow..

[11] Ying Zhang,et al. GB-KMV: An Augmented KMV Sketch for Approximate Containment Similarity Search , 2018, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[12] Lijun Chang,et al. Leveraging Set Relations in Exact Set Similarity Join , 2017, Proc. VLDB Endow..

[13] Lijun Chang,et al. Efficient Maximum Clique Computation over Large Sparse Graphs , 2019, KDD.

[14] Jeffrey Xu Yu,et al. Efficient similarity joins for near-duplicate detection , 2011, TODS.

[15] Yufei Tao,et al. Overlap Set Similarity Joins with Theoretical Guarantees , 2018, SIGMOD Conference.

[16] Chuan Xiao,et al. Pigeonring: A Principle for Faster Thresholded Similarity Search , 2018, Proc. VLDB Endow..

[17] Akira Tanaka,et al. The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..

[18] Xuemin Lin,et al. Generalizing the Pigeonhole Principle for Similarity Search in Hamming Space , 2019, IEEE Transactions on Knowledge and Data Engineering.

[19] David Eppstein,et al. Listing All Maximal Cliques in Sparse Graphs in Near-optimal Time , 2010, Exact Complexity of NP-hard Problems.

[20] Kevin A. Naudé. Refined pivot selection for maximal clique enumeration in graphs , 2016, Theor. Comput. Sci..

[21] Zohar Yakhini,et al. Similarities and differences of gene expression in yeast stress conditions , 2007, Bioinform..

[22] Yun Yang,et al. Mining Maximal Clique Summary with Effective Sampling , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[23] Pablo San Segundo,et al. Efficiently Enumerating all Maximal Cliques with Bit-Parallelism , 2017, Comput. Oper. Res..

[24] Cyrus Rashtchian,et al. LSF-Join: Locality Sensitive Filtering for Distributed All-Pairs Set Similarity Under Skew , 2020, WWW.

[25] Patric R. J. Östergård,et al. A fast algorithm for the maximum clique problem , 2002, Discret. Appl. Math..

[26] Aristides Gionis,et al. Approximating a collection of frequent sets , 2004, KDD.

[27] Frédéric Cazals,et al. A note on the problem of reporting maximal cliques , 2008, Theor. Comput. Sci..