Graph-Based Induction for General Graph Structured Data and Its Application to Chemical Compound Data

Most of the relations are represented by a graph structure, e.g., chemical bonding, Web browsing record, DNA sequence, Inference pattern (program trace), to name a few. Thus, efficiently finding characteristic substructures in a graph will be a useful technique in many important KDD/ML applications. However, graph pattern matching is a hard problem. We propose a machine learning technique called Graph-Based Induction (GBI) that efficiently extracts typical patterns from graph data in an approximate manner by stepwise pair expansion (pairwise chunking). It can handle general graph structured data, i.e., directed/ undirected, colored/uncolored graphs with/without (self) loop and with colored/uncolored links. We show that its time complexity is almost linear with the size of graph. We, further, show that GBI can effectively be applied to the extraction of typical patterns from chemical compound data from which to generate classification rules, and that GBI also works as a feature construction component for other machine learning tools.

[1]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[3]  Ryszard S. Michalski,et al.  Learning flexible concepts: fundamental ideas and a method based on two-tiered representation , 1990 .

[4]  Masaru Kitsuregawa,et al.  Mining Algorithms for Sequential Patterns in Parallel: Hash Based Approach , 1998, PAKDD.

[5]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[8]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[9]  Scott Fortin The Graph Isomorphism Problem , 1996 .

[10]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[11]  A. Debnath,et al.  Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. , 1991, Journal of medicinal chemistry.

[12]  Hiroshi Motoda,et al.  Machine Learning Techniques to Make Computers Easier to Use , 1997, IJCAI.

[13]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[14]  Tadashi Horiuchi,et al.  Extension of Graph-Based Induction for General Graph Structured Data , 2000, PAKDD.

[15]  Hiroshi Motoda,et al.  CLIP: Concept Learning from Inference Patterns , 1995, Artif. Intell..