Generalization for Frequent Subgraph Mining

Data mining to derive frequent subgraphs from a dataset of general graphs has high computational complexity because it includes the explosively combinatorial search for candidate subgraphs and subgraph isomorphism matching. Although some approaches have been proposed to derive characteristic patterns from graph structured data, they limit the graphs to be searched within a specific class. In this paper, we propose an approach to conduct a complete search of various classes of frequent subgraphs in a massive dataset of labeled graphs within practical time. The power of our approach comes from the algebraic representation of graphs, its associated operations and well-organized bias constraints to limit the search space efficiently. Its performance has been evaluated through real world datasets, and the high scalability of our approach has been confirmed with respect to the amount of data and the computation time.

[1]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[2]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[3]  John A. Bernard,et al.  Expert Systems Applications Within the Nuclear Industry , 1989 .

[4]  Hiroshi Motoda,et al.  Machine Learning Techniques to Make Computers Easier to Use , 1997, IJCAI.

[5]  Hannu Toivonen,et al.  Finding Frequent Substructures in Chemical Compounds , 1998, KDD.

[6]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[7]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[8]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[9]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[10]  Hiroki Arimura,et al.  Discovering Frequent Substructures in Large Unordered Trees , 2003, Discovery Science.

[11]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[12]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[13]  Hiroshi Motoda,et al.  CLIP: Concept Learning from Inference Patterns , 1995, Artif. Intell..

[14]  Luc De Raedt,et al.  The Levelwise Version Space Algorithm and its Application to Molecular Fragment Finding , 2001, IJCAI.

[15]  Tadashi Horiuchi,et al.  Extension of Graph-Based Induction for General Graph Structured Data , 2000, PAKDD.

[16]  Joost N. Kok,et al.  Faster Association Rules for Multiple Relations , 2001, IJCAI.

[17]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[18]  Heikki Mannila,et al.  Ordered and Unordered Tree Inclusion , 1995, SIAM J. Comput..

[19]  Takashi Washio,et al.  Complete Mining of Frequent Patterns from Graphs: Mining Graph Data , 2003, Machine Learning.

[20]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.