Discriminative Subgraph Mining for Protein Classification

Protein classification can be performed by representing 3-D protein structures by graphs and then classifying the corresponding graphs. One effective way to classify such graphs is to use frequent subgraph patterns as features; however, the effectiveness of using subgraph patterns in graph classification is often hampered by the large search space of subgraph patterns. In this paper, the authors present two efficient discriminative subgraph mining algorithms: COM and GAIA. These algorithms directly search for discriminative subgraph patterns rather than frequent subgraph patterns which can be used to generate classification rules. Experimental results show that COM and GAIA can achieve high classification accuracy and runtime efficiency. Additionally, they find substructures that are very close to the proteins’ actual active sites.

[1]  Sergey Petoukhov,et al.  Multidimensional Numbers and the Genomatrices of Hydrogen Bonds , 2010 .

[2]  Jun Huan,et al.  GPM: A graph pattern matching kernel with diffusion for chemical compound classification , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[3]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[4]  Limin Angela Liu Interdisciplinary Research and Applications in Bioinformatics, Computational Biology, and Environmental Sciences , 2010 .

[5]  Hongliang Fei,et al.  Structure feature selection for graph classification , 2008, CIKM '08.

[6]  Nicole Krämer,et al.  Partial least squares regression for graph mining , 2008, KDD.

[7]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[8]  Alok N. Choudhary,et al.  Association Rule Mining Based HotSpot Analysis on SEER Lung Cancer Data , 2011, Int. J. Knowl. Discov. Bioinform..

[9]  Dianne Nicol,et al.  Genetic Testing and Protection of Genetic Privacy: A Comparative Legal Analysis in Europe and Australia , 2011 .

[10]  Matthew He,et al.  Symmetrical Analysis Techniques for Genetic Systems and Bioinformatics: Advanced Patterns and Applications , 2009 .

[11]  J. Snoeyink,et al.  Mining Spatial Motifs from Protein Structure Graphs , 2003 .

[12]  Soraj Hongladarom Genomics and Bioethics: Interdisciplinary Perspectives, Technologies and Advancements , 2010 .

[13]  Hisashi Kashima,et al.  Kernels for Semi-Structured Data , 2002, ICML.

[14]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[15]  Jinze Liu,et al.  Structure‐based function inference using protein family‐specific fingerprints , 2006, Protein science : a publication of the Protein Society.

[16]  Lawrence B. Holder,et al.  Graph-Based Relational Concept Learning , 2002, International Conference on Machine Learning.

[17]  Hongliang Fei,et al.  L2 norm regularized feature kernel regression for graph data , 2009, CIKM.

[18]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[19]  Jack Snoeyink,et al.  Almost-Delaunay simplices: nearest neighbor relations for imprecise points , 2004, SODA '04.

[20]  Rui Guo,et al.  Digital Auscultation System of Traditional Chinese Medicine and Its Signals Acquisition: Analysis Methods , 2011 .

[21]  Ambuj K. Singh,et al.  GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Wei Wang,et al.  Mining protein family specific residue packing patterns from protein structure graphs , 2004, RECOMB.

[24]  Yuji Matsumoto,et al.  An Application of Boosting to Graph Classification , 2004, NIPS.

[25]  Wei Wang,et al.  Graph classification based on pattern co-occurrence , 2009, CIKM.

[26]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[27]  Wei Wang,et al.  GAIA: graph classification using evolutionary computation , 2010, SIGMOD Conference.

[28]  Lawrence B. Holder,et al.  Graph-Based Shape Analysis for MRI Classification , 2011, Int. J. Knowl. Discov. Bioinform..

[29]  Philip S. Yu,et al.  Near-optimal Supervised Feature Selection among Frequent Subgraphs , 2009, SDM.

[30]  Andreas Zell,et al.  Optimal assignment kernels for attributed molecular graphs , 2005, ICML.

[31]  Jason Tsong-Li Wang,et al.  Bioinformatics Methods for Studying MicroRNA and ARE-Mediated Regulation of Post-Transcriptional Gene Expression , 2010, Int. J. Knowl. Discov. Bioinform..

[32]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[33]  Oleg Okun Feature Selection and Ensemble Methods for Bioinformatics: Algorithmic Classification and Implementations , 2011 .