SVM-based Method for Predicting Enzyme Function in a Hierarchical Context

Automatically categorizing enzyme into the Enzyme Commission (EC) hierar- chy is crucial to understand its specific molecular mechanism. Standard machine learning methods like support vector machine (SVM) and naive bayesian classifier have been suc- cessfully applied for this task. However, they treat each functional class independently, and ignore the inter-class relationships. In this paper, we develop a SVM-based method for prediction of enzyme function into the EC hierarchical context. Our method with low computational complexity is a modified version of a structured predictive model— Hierarchical Max-Margin Markov algorithm (HM 3 ). HM 3 , which is specially designed for the hierarchical multi-label classification, has been successfully used in many struc- tured pattern recognition problems, such as document categorization, web contend clas- sification, and enzyme function prediction. As input features for our predictive model, we use the conjoint triad feature (CTF). Our method has been validated on an enzyme benchmark dataset, the proteins in this benchmark dataset have less than 40% sequence identity to any other in a same functional class. Finally, for the first three EC digits, the predictive accuracy and the Matthew's correlation coefficient (MCC) of our method range from 78% to 100% and 0.76 to 1 respectively. Therefore we think our new method will be useful supplementary tools for the future studies in enzyme function prediction. Keywords: Enzyme function prediction; Conjoint triad feature; Structured hierarchical output; Support vector machine.

[1]  B. Palsson Systems Biology: Properties of Reconstructed Networks , 2006 .

[2]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[3]  Howard Leung,et al.  Prediction of membrane protein types from sequences and position-specific scoring matrices. , 2007, Journal of theoretical biology.

[4]  Juho Rousu,et al.  Towards structured output prediction of enzyme function , 2008, BMC proceedings.

[5]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[6]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[7]  K. Chou,et al.  EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. , 2007, Biochemical and biophysical research communications.

[8]  Juho Rousu,et al.  Kernel-Based Learning of Hierarchical Multilabel Classification Models , 2006, J. Mach. Learn. Res..

[9]  Kotaro Hirasawa,et al.  Support Vector Machine with Fuzzy Decision-Making for Real-world Data Classification , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[10]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[11]  Daniel R. Caffrey,et al.  Structure-based maximal affinity model predicts small-molecule druggability , 2007, Nature Biotechnology.

[12]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[13]  Kuo-Chen Chou,et al.  Prediction of enzyme family classes. , 2003, Journal of proteome research.

[14]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[15]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[16]  Ling-Yun Wu,et al.  Prediction of palmitoylation sites using the composition of k-spaced amino acid pairs. , 2009, Protein engineering, design & selection : PEDS.

[17]  J. Skolnick,et al.  How well is enzyme function conserved as a function of pairwise sequence identity? , 2003, Journal of molecular biology.

[18]  Ling Jing,et al.  Predicting DNA- and RNA-binding proteins from sequences with kernel methods. , 2009, Journal of theoretical biology.