论文信息 - Flexible and Feasible Support Measures for Mining Frequent Patterns in Large Labeled Graphs

Flexible and Feasible Support Measures for Mining Frequent Patterns in Large Labeled Graphs

In recent years, the popularity of graph databases has grown rapidly. This paper focuses on single-graph as an effective model to represent information and its related graph mining techniques. In frequent pattern mining in a single-graph setting, there are two main problems: support measure and search scheme. In this paper, we propose a novel framework for constructing support measures that brings together existing minimum-image-based and overlap-graph-based support measures. Our framework is built on the concept of occurrence / instance hypergraphs. Based on that, we present two new support measures: minimum instance (MI) measure and minimum vertex cover (MVC) measure, that combine the advantages of existing measures. In particular, we show that the existing minimum-image-based support measure is an upper bound of the MI measure, which is also linear-time computable and results in counts that are close to number of instances of a pattern. Although the MVC measure is NP-hard, it can be approximated to a constant factor in polynomial time. We also provide polynomial-time relaxations for both measures and bounding theorems for all presented support measures in the hypergraph setting. We further show that the hypergraph-based framework can unify all support measures studied in this paper. This framework is also flexible in that more variants of support measures can be defined and profiled in it.

Yi-Cheng Tu | Jinghan Meng | Yi-Cheng Tu | Jinghan Meng

[1] Kun-Lung Wu,et al. Towards proximity pattern mining in large graphs , 2010, SIGMOD Conference.

[2] Marie-Francine Moens,et al. Mining User Generated Content , 2014 .

[3] George Karypis,et al. Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[4] Jiawei Han,et al. Mining Graph Patterns , 2014, Frequent Pattern Mining.

[5] Siegfried Nijssen,et al. What Is Frequent in a Single Graph? , 2007, PAKDD.

[6] Ehud Gudes,et al. Computing frequent graph patterns from semistructured data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7] Jennifer Widom,et al. Mining the space of graph properties , 2004, KDD.

[8] Toon Calders,et al. Anti-monotonic Overlap-Graph Support Measures , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[9] Richard M. Karp,et al. Reducibility among combinatorial problems" in complexity of computer computations , 1972 .

[10] Jan Ramon,et al. An Efficiently Computable Support Measure for Frequent Subgraph Pattern Mining , 2012, ECML/PKDD.

[11] Christian Borgelt,et al. Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[12] Wei Wang,et al. Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[13] Jeong Hyun Kang,et al. Combinatorial Geometry , 2006 .

[14] Jiawei Han,et al. CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[15] Ehud Gudes,et al. Support measures for graph data* , 2006, Data Mining and Knowledge Discovery.

[16] Takashi Washio,et al. Complete Mining of Frequent Patterns from Graphs: Mining Graph Data , 2003, Machine Learning.

[17] Christian Borgelt,et al. Support Computation for Mining Frequent Subgraphs in a Single Graph , 2007, MLG.

[18] George Karypis,et al. An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[19] Matthias Dehmer,et al. Quantitative graph theory : mathematical foundations and applications , 2014 .

[20] Richard M. Karp,et al. Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[21] References , 1971 .

[22] Wei Wang,et al. An Efficient Algorithm of Frequent Connected Subgraph Extraction , 2003, PAKDD.

[23] Jan Ramon,et al. An efficiently computable subgraph pattern support measure: counting independent observations , 2013, Data Mining and Knowledge Discovery.

[24] László Lovász,et al. On the ratio of optimal integral and fractional covers , 1975, Discret. Math..

[25] Panos Kalnis,et al. GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph , 2014, Proc. VLDB Endow..

[26] George Karypis,et al. GREW - a scalable frequent subgraph discovery algorithm , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[27] Takashi Washio,et al. An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[28] Fabian M. Suchanek,et al. AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[29] Takashi Washio,et al. A Fast Method to Mine Frequent Subsequences from Graph Sequence Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[30] Jiawei Han,et al. gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[31] Aristides Gionis,et al. Mining Graph Evolution Rules , 2009, ECML/PKDD.

[32] Eran Halperin,et al. Improved approximation algorithms for the vertex cover problem in graphs and hypergraphs , 2000, SODA '00.