Prediction of cis/trans isomerization using feature selection and support vector machines

In protein structures the peptide bond is found to be in trans conformation in the majority of the cases. Only a small fraction of peptide bonds in proteins is reported to be in cis conformation. Most of these instances (>90%) occur when the peptide bond is an imide (X-Pro) rather than an amide bond (X-nonPro). Due to the implication of cis/trans isomerization in many biologically significant processes, the accurate prediction of the peptide bond conformation is of high interest. In this study, we evaluate the effect of a wide range of features, towards the reliable prediction of both proline and non-proline cis/trans isomerization. We use evolutionary profiles, secondary structure information, real-valued solvent accessibility predictions for each amino acid and the physicochemical properties of the surrounding residues. We also explore the predictive impact of a modified feature vector, which consists of condensed position-specific scoring matrices (PSSMX), secondary structure and solvent accessibility. The best discriminating ability is achieved using the first feature vector combined with a wrapper feature selection algorithm and a support vector machine (SVM). The proposed method results in 70% accuracy, 75% sensitivity and 71% positive predictive value (PPV) in the prediction of the peptide bond conformation between any two amino acids. The output of the feature selection stage is investigated in order to identify discriminatory features as well as the contribution of each neighboring residue in the formation of the peptide bond, thus, advancing our knowledge towards cis/trans isomerization.

[1]  A. Jabs,et al.  Peptide bonds revisited , 1998, Nature Structural &Molecular Biology.

[2]  S. Hewitt,et al.  Infrared spectroscopic imaging for histopathologic recognition , 2005, Nature Biotechnology.

[3]  Robert Preissner,et al.  Conservation of cis prolyl bonds in proteins during evolution , 2004, Proteins.

[4]  M-L Wang,et al.  Support vector machines for prediction of peptidyl prolyl cis/trans isomerization. , 2008, The journal of peptide research : official journal of the American Peptide Society.

[5]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[6]  Yu-Yen Ou,et al.  Protein disorder prediction by condensed PSSM considering propensity for order or disorder , 2006, BMC Bioinformatics.

[7]  L. Nicholson,et al.  Prolyl cis-trans isomerization as a molecular timer. , 2007, Nature chemical biology.

[8]  Robert M Cooke,et al.  A conserved cis peptide bond is necessary for the activity of Bowman-Birk inhibitor protein. , 2002, Biochemistry.

[9]  C. Matthews,et al.  A cis-prolyl peptide bond isomerization dominates the folding of the alpha subunit of Trp synthase, a TIM barrel protein. , 2002, Journal of molecular biology.

[10]  William Perrizo,et al.  Comprehensive vertical sample-based KNN/LSVM classification for gene expression analysis , 2004, J. Biomed. Informatics.

[11]  J E Wampler,et al.  Occurrence and role of cis peptide bonds in protein structures. , 1990, Journal of molecular biology.

[12]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[13]  Henry A. Lester,et al.  Cis–trans isomerization at a proline opens the pore of a neurotransmitter-gated ion channel , 2005, Nature.

[14]  Robert Preissner,et al.  Prediction of prolyl residues in cis‐conformation in protein structures on the basis of the amino acid sequence , 1990, FEBS letters.

[15]  Yoichi Muraoka,et al.  Predicting the protein disordered region using modified position specific scoring matrix , 2004 .

[16]  Shandar Ahmad,et al.  RVP-net: online prediction of real valued accessible surface area of proteins from single sequences , 2003, Bioinform..

[17]  P. Argos,et al.  Structural prediction of membrane-bound proteins. , 2005, European journal of biochemistry.

[18]  D. Pal,et al.  Cis peptide bonds in proteins: residues involved, their conformations, interactions and locations. , 1999, Journal of molecular biology.

[19]  L. Ohno-Machado Journal of Biomedical Informatics , 2001 .

[20]  Wolfgang Stroebe Breaking up is hard to do. , 1983 .

[21]  Dirk Labudde,et al.  2Statistically significant dependence of the Xaa-Pro peptide bond conformation on secondary structure and amino acid sequence , 2004, BMC Structural Biology.

[22]  David S. Wishart,et al.  VADAR: a web server for quantitative evaluation of protein structure quality , 2003, Nucleic Acids Res..

[23]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[24]  Lewis Y. Geer,et al.  Database resources of the National Center for Biotechnology Information , 2014, Nucleic Acids Res..

[25]  K. Wüthrich,et al.  Nmr studies of the rates of proline cis–trans isomerization in oligopeptides , 1981 .

[26]  Christophe Dugave,et al.  Cis-trans isomerization of organic molecules and biomolecules: implications and applications. , 2003, Chemical reviews.

[27]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[28]  A. Jabs,et al.  Non-proline cis peptide bonds in proteins. , 1999, Journal of molecular biology.

[29]  G. N. Ramachandran,et al.  An explanation for the rare occurrence of cis peptide units in proteins and polypeptides. , 1976, Journal of molecular biology.

[30]  Lucila Ohno-Machado,et al.  Research on machine learning issues in biomedical informatics modeling , 2004, J. Biomed. Informatics.

[31]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[32]  Jiangning Song,et al.  Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information , 2006, BMC Bioinformatics.

[33]  G Fischer,et al.  Regulation of peptide bond cis/trans isomerization by enzyme catalysis and its implication in physiological processes. , 2003, Reviews of physiology, biochemistry and pharmacology.

[34]  Dirk Labudde,et al.  COPS - Cis/trans peptide bond conformation prediction of amino acids on the basis of secondary structure information , 2005, Bioinform..

[35]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[36]  Rengül Çetin-Atalay,et al.  Implicit motif distribution based hybrid computational kernel for sequence classification , 2005, Bioinform..

[37]  Ian Witten,et al.  Data Mining , 2000 .

[38]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .