Mining Patterns from Structured Data by Beam-Wise Graph-Based Induction

A machine learning technique called Graph-Based Induction (GBI) extracts typical patterns from graph data by stepwise pair expansion (pairwise chunking). Because of its greedy search strategy, it is very efficient but suffers from incompleteness of search. Improvement is made on its search capability without imposing much computational complexity by 1) incorporating a beam search, 2) using a different evaluation function to extract patterns that are more discriminatory than those simply occurring frequently, and 3) adopting canonical labeling to enumerate identical patterns accurately. This new algorithm, now called Beam-wise GBI, B-GBI for short, was tested against a small DNA dataset from UCI repository and shown successful in extracting discriminatory substructures.

[1]  Jude W. Shavlik,et al.  Extracting refined rules from knowledge-based neural networks , 2004, Machine Learning.

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Luc De Raedt,et al.  The Levelwise Version Space Algorithm and its Application to Molecular Fragment Finding , 2001, IJCAI.

[4]  Hiroshi Motoda,et al.  CLIP: Concept Learning from Inference Patterns , 1995, Artif. Intell..

[5]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[6]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[7]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[8]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[9]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[10]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[11]  Lawrence B. Holder,et al.  Graph-Based Data Mining , 2000, IEEE Intell. Syst..

[12]  Takashi Washio,et al.  Graph-based induction and its applications , 2002, Adv. Eng. Informatics.

[13]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[14]  Derek G. Corneil,et al.  The graph isomorphism disease , 1977, J. Graph Theory.

[15]  Hannu Toivonen,et al.  Finding Frequent Substructures in Chemical Compounds , 1998, KDD.

[16]  Ryszard S. Michalski,et al.  Learning flexible concepts: fundamental ideas and a method based on two-tiered representation , 1990 .

[17]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[18]  Tadashi Horiuchi,et al.  Extension of Graph-Based Induction for General Graph Structured Data , 2000, PAKDD.