Support Computation for Mining Frequent Subgraphs in a Single Graph

Defining the support (or frequency) of a subgraph is trivial when a database of graphs is given: it is simply the number of graphs in the database that contain the subgraph. However, if the input is one large graph, it is surprisingly difficult to find an appropriate support definition. In this paper we study the core problem, namely overlapping embeddings of the subgraph, in detail and suggest a definition that relies on the non-existence of equivalent ancestor embeddings in order to guarantee that the resulting support is anti-monotone. We prove this property and describe a method to compute the support defined in this way.

[1]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[2]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[4]  Ashwin Srinivasan,et al.  Pharmacophore Discovery Using the Inductive Logic Programming System PROGOL , 1998, Machine Learning.

[5]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[6]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[7]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[8]  Ehud Gudes,et al.  Computing frequent graph patterns from semistructured data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[9]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[10]  Lawrence B. Holder,et al.  Graph-Based Data Mining , 2000, IEEE Intell. Syst..

[11]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[12]  Christian Borgelt,et al.  Canonical Forms for Frequent Graph Mining , 2006, GfKl.