Flexible and Feasible Support Measures for Mining Frequent Patterns in Large Labeled Graphs

In recent years, the popularity of graph databases has grown rapidly. This paper focuses on single-graph as an effective model to represent information and its related graph mining techniques. In frequent pattern mining in a single-graph setting, there are two main problems: support measure and search scheme. In this paper, we propose a novel framework for constructing support measures that brings together existing minimum-image-based and overlap-graph-based support measures. Our framework is built on the concept of occurrence / instance hypergraphs. Based on that, we present two new support measures: minimum instance (MI) measure and minimum vertex cover (MVC) measure, that combine the advantages of existing measures. In particular, we show that the existing minimum-image-based support measure is an upper bound of the MI measure, which is also linear-time computable and results in counts that are close to number of instances of a pattern. Although the MVC measure is NP-hard, it can be approximated to a constant factor in polynomial time. We also provide polynomial-time relaxations for both measures and bounding theorems for all presented support measures in the hypergraph setting. We further show that the hypergraph-based framework can unify all support measures studied in this paper. This framework is also flexible in that more variants of support measures can be defined and profiled in it.

[1]  Kun-Lung Wu,et al.  Towards proximity pattern mining in large graphs , 2010, SIGMOD Conference.

[2]  Marie-Francine Moens,et al.  Mining User Generated Content , 2014 .

[3]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[4]  Jiawei Han,et al.  Mining Graph Patterns , 2014, Frequent Pattern Mining.

[5]  Siegfried Nijssen,et al.  What Is Frequent in a Single Graph? , 2007, PAKDD.

[6]  Ehud Gudes,et al.  Computing frequent graph patterns from semistructured data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Jennifer Widom,et al.  Mining the space of graph properties , 2004, KDD.

[8]  Toon Calders,et al.  Anti-monotonic Overlap-Graph Support Measures , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[9]  Richard M. Karp,et al.  Reducibility among combinatorial problems" in complexity of computer computations , 1972 .

[10]  Jan Ramon,et al.  An Efficiently Computable Support Measure for Frequent Subgraph Pattern Mining , 2012, ECML/PKDD.

[11]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[12]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[13]  Jeong Hyun Kang,et al.  Combinatorial Geometry , 2006 .

[14]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[15]  Ehud Gudes,et al.  Support measures for graph data* , 2006, Data Mining and Knowledge Discovery.

[16]  Takashi Washio,et al.  Complete Mining of Frequent Patterns from Graphs: Mining Graph Data , 2003, Machine Learning.

[17]  Christian Borgelt,et al.  Support Computation for Mining Frequent Subgraphs in a Single Graph , 2007, MLG.

[18]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[19]  Matthias Dehmer,et al.  Quantitative graph theory : mathematical foundations and applications , 2014 .

[20]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[21]  References , 1971 .

[22]  Wei Wang,et al.  An Efficient Algorithm of Frequent Connected Subgraph Extraction , 2003, PAKDD.

[23]  Jan Ramon,et al.  An efficiently computable subgraph pattern support measure: counting independent observations , 2013, Data Mining and Knowledge Discovery.

[24]  László Lovász,et al.  On the ratio of optimal integral and fractional covers , 1975, Discret. Math..

[25]  Panos Kalnis,et al.  GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph , 2014, Proc. VLDB Endow..

[26]  George Karypis,et al.  GREW - a scalable frequent subgraph discovery algorithm , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[27]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[28]  Fabian M. Suchanek,et al.  AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[29]  Takashi Washio,et al.  A Fast Method to Mine Frequent Subsequences from Graph Sequence Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[30]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[31]  Aristides Gionis,et al.  Mining Graph Evolution Rules , 2009, ECML/PKDD.

[32]  Eran Halperin,et al.  Improved approximation algorithms for the vertex cover problem in graphs and hypergraphs , 2000, SODA '00.