An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

This paper proposes a novel approach named AGM to efficiently mine the association rules among the frequently appearing substructures in a given graph data set. A graph transaction is represented by an adjacency matrix, and the frequent patterns appearing in the matrices are mined through the extended algorithm of the basket analysis. Its performance has been evaluated for the artificial simulation data and the carcinogenesis data of Oxford University and NTP. Its high efficiency has been confirmed for the size of a real-world problem.

[1]  Hannu Toivonen,et al.  Finding Frequent Substructures in Chemical Compounds , 1998, KDD.

[2]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[3]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[4]  Scott Fortin The Graph Isomorphism Problem , 1996 .

[5]  Takashi Washio,et al.  Derivation of the Topology Structure from Massive Graph Data , 1999, Discovery Science.

[6]  G. Klopman MULTICASE 1. A Hierarchical Computer Automated Structure Evaluation Program , 1992 .

[7]  G. Klopman Artificial intelligence approach to structure-activity studies. Computer automated structure evaluation of biological activity of organic molecules , 1985 .

[8]  Stefan Kramer,et al.  Mining for Causes of Cancer: Machine Learning Experiments at Various Levels of Detail , 1997, KDD.

[9]  M J Sternberg,et al.  Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Tadashi Horiuchi,et al.  Extension of Graph-Based Induction for General Graph Structured Data , 2000, PAKDD.

[11]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[12]  Ashwin Srinivasan,et al.  The Predictive Toxicology Evaluation Challenge , 1997, IJCAI.

[13]  Ke Wang,et al.  Schema Discovery for Semistructured Data , 1997, KDD.