论文信息 - Non-Alignment Features Based Enzyme/Non-Enzyme Classification Using an Ensemble Method

Non-Alignment Features Based Enzyme/Non-Enzyme Classification Using an Ensemble Method

As a growing number of protein structures are resolved without known functions, using computational methods to help predict protein functions from the structures becomes more and more important. Some computational methods predict protein functions by aligning to homologous proteins with known functions, but they fail to work if such homology cannot be identified. In this paper we classify enzymes/non-enzymes using non-alignment features. We propose a new ensemble method that includes three support vector machines (SVM) and two k-nearest neighbor algorithms (k-NN) and uses a simple majority voting rule. The test on a data set of 697 enzymes and 480 non-enzymes adapted from Dobson and Doig shows 85.59% accuracy in a 10-fold cross validation and 86.49% accuracy in a leave-one-out validation. The prediction accuracy is much better than other non-alignment features based methods and even slightly better than alignment features based methods. To our knowledge, our method is the first time to use ensemble methods to classify enzymes/non-enzymes and is superior over a single classifier.

Nicholas J. Davidson | Xueyi Wang | Xueyi Wang

[1] D. Eisenberg,et al. A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[2] Sanguthevar Rajasekaran,et al. Minimotif miner 2nd release: a database and web system for motif search , 2008, Nucleic Acids Res..

[3] D. Lipman,et al. Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[4] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[5] Zhiyong Lu,et al. Predicting subcellular localization of proteins using machine-learned classifiers , 2004, Bioinform..

[6] David A. Lee,et al. PSI-2: structural genomics to cover protein domain family space. , 2009, Structure.

[7] Liisa Holm,et al. Searching protein structure databases with DaliLite v.3 , 2008, Bioinform..

[8] Tong Zhang,et al. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[9] Hans-Peter Kriegel,et al. Protein function prediction via graph kernels , 2005, ISMB.

[10] Jacquelyn S Fetrow,et al. Structure-based active site profiles for genome analysis and functional family subclassification. , 2003, Journal of molecular biology.

[11] Irwin D Kuntz,et al. Small molecule affinity fingerprinting. A tool for enzyme family subclassification, target identification, and inhibitor design. , 2002, Chemistry & biology.

[12] Steven E Brenner,et al. The Impact of Structural Genomics: Expectations and Outcomes , 2005, Science.

[13] Osvaldo Olmea,et al. MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[14] Martin Ester,et al. Sequence analysis PSORTb v . 2 . 0 : Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis , 2004 .

[15] P. Argos,et al. Knowledge‐based protein secondary structure assignment , 1995, Proteins.