Canonical Forms for Frequent Graph Mining

A core problem of approaches to frequent graph mining, which are based on growing subgraphs into a set of graphs, is how to avoid redundant search. A powerful technique for this is a canonical description of a graph, which uniquely identifies it, and a corresponding test. I introduce a family of canonical forms based on systematic ways to construct spanning trees. I show that the canonical form used in gSpan ([Yan and Han (2002)]) is a member of this family, and that MoSS/MoFa ([Borgelt and Berthold (2002), Borgelt et al. (2005)]) is implicitly based on a different member, which I make explicit and exploit in the same way.

[1]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[2]  Christian Borgelt,et al.  Advanced pruning strategies to speed up mining closed molecular fragments , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[3]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[4]  Christian Borgelt,et al.  MoSS: a program for molecular substructure mining , 2005 .

[5]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[6]  Ashwin Srinivasan,et al.  Pharmacophore Discovery Using the Inductive Logic Programming System PROGOL , 1998, Machine Learning.

[7]  Takashi Washio,et al.  State of the art of graph-based data mining , 2003, SKDD.

[8]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[9]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[10]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[12]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[13]  Lawrence B. Holder,et al.  Graph-Based Data Mining , 2000, IEEE Intell. Syst..

[14]  Bart Goethals,et al.  FIMI '03, Frequent Itemset Mining Implementations, Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations, 19 December 2003, Melbourne, Florida, USA , 2003, FIMI.