Prediction of ketoacyl synthase family using reduced amino acid alphabets

Ketoacyl synthases are enzymes involved in fatty acid synthesis and can be classified into five families based on primary sequence similarity. Different families have different catalytic mechanisms. Developing cost-effective computational models to identify the family of ketoacyl synthases will be helpful for enzyme engineering and in knowing individual enzymes’ catalytic mechanisms. In this work, a support vector machine-based method was developed to predict ketoacyl synthase family using the n-peptide composition of reduced amino acid alphabets. In jackknife cross-validation, the model based on the 2-peptide composition of a reduced amino acid alphabet of size 13 yielded the best overall accuracy of 96.44% with average accuracy of 93.36%, which is superior to other state-of-the-art methods. This result suggests that the information provided by n-peptide compositions of reduced amino acid alphabets provides efficient means for enzyme family classification and that the proposed model can be efficiently used for ketoacyl synthase family annotation.

[1]  Yu-Dong Cai,et al.  Prediction and Analysis of Protein Hydroxyproline and Hydroxylysine , 2010, PloS one.

[2]  Hui Ding,et al.  Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. , 2011, Journal of theoretical biology.

[3]  Kuo-Chen Chou,et al.  Using GO-PseAA predictor to predict enzyme sub-class. , 2004, Biochemical and biophysical research communications.

[4]  Lourdes Santana,et al.  A model for the recognition of protein kinases based on the entropy of 3D van der Waals interactions. , 2007, Journal of proteome research.

[5]  Dongsheng Zou,et al.  Supersecondary structure prediction using Chou's pseudo amino acid composition , 2011, J. Comput. Chem..

[6]  Humberto González Díaz,et al.  Computational chemistry study of 3D‐structure‐function relationships for enzymes based on Markov models for protein electrostatic, HINT, and van der Waals potentials , 2009, J. Comput. Chem..

[7]  C. Etchebest,et al.  Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks , 2000, Proteins.

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Alexandre G. de Brevern,et al.  New assessment of a structural alphabet , 2005, Silico Biol..

[10]  Bohdan Schneider,et al.  A short survey on protein blocks , 2010, Biophysical Reviews.

[11]  Wei Chen,et al.  Prediction of thermophilic proteins using feature selection technique. , 2011, Journal of microbiological methods.

[12]  Lukasz Kurgan,et al.  Improved identification of outer membrane beta barrel proteins using primary sequence, predicted secondary structure, and evolutionary information , 2011, Proteins.

[13]  Hong-Bin Shen,et al.  Multi Label Learning for Prediction of Human Protein Subcellular Localizations , 2009, The protein journal.

[14]  K. Chou,et al.  Application of SVM to predict membrane protein types. , 2004, Journal of theoretical biology.

[15]  Peter J. Reilly,et al.  ThYme: a database for thioester-active enzymes , 2010, Nucleic Acids Res..

[16]  Kuo-Chen Chou,et al.  HP-Lattice QSAR for dynein proteins: experimental proteomics (2D-electrophoresis, mass spectrometry) and theoretic study of a Leishmania infantum sequence. , 2008, Bioorganic & medicinal chemistry.

[17]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[18]  K. Chou,et al.  EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. , 2007, Biochemical and biophysical research communications.

[19]  Hasan Ogul,et al.  Subcellular Localization Prediction with New Protein Encoding Schemes , 2007, IEEE ACM Trans. Comput. Biol. Bioinform..

[20]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[21]  E. Uriarte,et al.  3D entropy and moments prediction of enzyme classes and experimental-theoretic study of peptide fingerprints in Leishmania parasites. , 2009, Biochimica et biophysica acta.

[22]  Humberto González Díaz,et al.  Computational chemistry approach to protein kinase recognition using 3D stochastic van der Waals spectral moments , 2007, J. Comput. Chem..

[23]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[24]  Qian-zhong Li,et al.  Using increment of diversity to predict mitochondrial proteins of malaria parasite: integrating pseudo-amino acid composition and structural alphabet , 2010, Amino Acids.

[25]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[26]  H. González-Díaz,et al.  Review of QSAR models for enzyme classes of drug targets: Theoretical background and applications in parasites, hosts, and other organisms. , 2010, Current pharmaceutical design.

[27]  Asifullah Khan,et al.  Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. , 2011, Journal of theoretical biology.

[28]  P. Suganthan,et al.  AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. , 2011, Journal of theoretical biology.

[29]  Wei Wang,et al.  Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids , 2007, Science in China Series C: Life Sciences.

[30]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[31]  Lourdes Santana,et al.  Proteomics, networks and connectivity indices , 2008, Proteomics.

[32]  Cristian R. Munteanu,et al.  Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices. , 2008, Journal of theoretical biology.

[33]  Kuo-Chen Chou,et al.  Prediction of enzyme family classes. , 2003, Journal of proteome research.

[34]  Guillermín Agüero-Chapín,et al.  QSAR for RNases and theoretic–experimental study of molecular diversity on peptide mass fingerprints of a new Leishmania infantum protein , 2009, Molecular Diversity.

[35]  Hao Lin The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. , 2008, Journal of theoretical biology.

[36]  Paul Horton,et al.  Discrimination of outer membrane proteins using support vector machines , 2005, Bioinform..

[37]  Humberto González-Díaz,et al.  Alignment-free prediction of polygalacturonases with pseudofolding topological indices: experimental isolation from Coffea arabica and prediction of a new sequence. , 2009, Journal of proteome research.

[38]  Hui Ding,et al.  Identify Golgi protein types with modified Mahalanobis discriminant algorithm and pseudo amino acid composition. , 2011, Protein and peptide letters.

[39]  A. G. Brevern,et al.  A reduced amino acid alphabet for understanding and designing protein adaptation to mutation , 2007, European Biophysics Journal.

[40]  Gianni Podda,et al.  Prediction of enzyme classes from 3D structure: a general model and examples of experimental-theoretic scoring of peptide mass fingerprints of Leishmania proteins. , 2009, Journal of proteome research.

[41]  L. G. Pérez-Montoto,et al.  Review of MARCH-INSIDE & complex networks prediction of drugs: ADMET, anti-parasite activity, metabolizing enzymes and cardiotoxicity proteome biomarkers. , 2010, Current drug metabolism.

[42]  Q Gu,et al.  Prediction of G-protein-coupled receptor classes in low homology using Chou's pseudo amino acid composition with approximate entropy and hydrophobicity patterns. , 2010, Protein and peptide letters.

[43]  Loris Nanni,et al.  A genetic approach for building different alphabets for peptide and protein classification , 2008, BMC Bioinformatics.

[44]  Qian-zhong Li,et al.  Using K-minimum increment of diversity to predict secretory proteins of malaria parasite based on groupings of amino acids , 2010, Amino Acids.

[45]  Kuo-Chen Chou,et al.  Predicting enzyme subclass by functional domain composition and pseudo amino acid composition. , 2005, Journal of proteome research.

[46]  Wei Chen,et al.  Prediction of midbody, centrosome and kinetochore proteins based on gene ontology information. , 2010, Biochemical and biophysical research communications.

[47]  Yi Xiong,et al.  An accurate feature‐based method for identifying DNA‐binding residues on protein surfaces , 2011, Proteins.