论文信息 - A Graph Mining Algorithm for Classifying Chemical Compounds

A Graph Mining Algorithm for Classifying Chemical Compounds

Graph data mining algorithms are increasingly applied to biological graph dataset. However, while existing graph mining algorithms can identify frequently occurring sub-graphs, these do not necessarily represent useful patterns. In this paper, we propose a novel graph mining algorithm, MIGDAC (Mining Graph DAta for Classification), that applies graph theory and an interestingness measure to discover interesting sub-graphs which can be both characterized and easily distinguished from other classes. Applying MIGDAC to the discovery of specific patterns of chemical compounds, we first represent each chemical compound as a graph and transform it into a set of hierarchical graphs. This not only represents more information that traditional formats, it also simplifies the complex graph structures. We then apply MIGDAC to extract a set of class-specific patterns defined in terms of an interestingness threshold and measure with residue analysis. The next step is to use weight of evidence to estimate whether the identified class-specific pattern will positively or negatively characterize a class of drug. Experiments on a drug dataset from the KEGG ligand database show that MIGDAC using hierarchical graph representation greatly improves the accuracy of the traditional frequent graph mining algorithms.

Keith C. C. Chan | Winnie W. M. Lam

[1] Hongyuan Zha,et al. A Comparison of Unsupervised Dimension Reduction Algorithms for Classification , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[2] Kenichi Kobayashi,et al. Mining Interesting Patterns Using Estimated Frequencies from Subpatterns and Superpatterns , 2003, Discovery Science.

[3] Yasuhiko Minamide,et al. Depth First Search , 2004, Arch. Formal Proofs.

[4] George Karypis,et al. Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[5] Jiawei Han,et al. gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[6] Ronald L. Rivest,et al. Introduction to Algorithms, Second Edition , 2001 .

[7] Christian Borgelt,et al. Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[8] Andrew K. C. Wong,et al. Statistical Technique for Extracting Classificatory Knowledge from Databases , 1991, Knowledge Discovery in Databases.

[9] Ashwin Srinivasan,et al. Warmr: a data mining tool for chemical data , 2001, J. Comput. Aided Mol. Des..

[10] Takashi Washio,et al. An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[11] Andrew K. C. Wong,et al. MAGMA: An Algorithm for Mining Multi-level Patterns in Genomic Data , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).