Intelligent Consensus Modeling for Proline Cis-Trans Isomerization Prediction

Proline cis-trans isomerization (CTI) plays a key role in the rate-determining steps of protein folding. Accurate prediction of proline CTI is of great importance for the understanding of protein folding, splicing, cell signaling, and transmembrane active transport in both the human body and animals. Our goal is to develop a state-of-the-art proline CTI predictor based on a biophysically motivated intelligent consensus modeling through the use of sequence information only (i.e., position specific scores generated by PSI-BLAST). The current computational proline CTI predictors reach about 70-73 percent Q2 accuracies and about 0.40 Matthew correlation coefficient (Mcc) through the use of sequence-based evolutionary information as well as predicted protein secondary structure information. However, our approach that utilizes a novel decision tree-based consensus model with a powerful randomized-metal earning technique has achieved 86.58 percent Q2 accuracy and 0.74 Mcc, on the same proline CTI data set, which is a better result than those of any existing computational proline CTI predictors reported in the literature.

[1]  Robert Preissner,et al.  Conservation of cis prolyl bonds in proteins during evolution , 2004, Proteins.

[2]  Roland T. Chin,et al.  An Automated Approach to the Design of Decision Tree Classifiers , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Robert Preissner,et al.  Prediction of prolyl residues in cis‐conformation in protein structures on the basis of the amino acid sequence , 1990, FEBS letters.

[4]  O. Lund,et al.  Prediction of protein secondary structure at 80% accuracy , 2000, Proteins.

[5]  Albert Y. Zomaya,et al.  A modular kernel approach for integrative analysis of protein domain boundaries , 2009, BMC Genomics.

[6]  Albert Y. Zomaya,et al.  DomNet: Protein Domain Boundary Prediction Using Enhanced General Regression Network and New Profiles , 2008, IEEE Transactions on NanoBioscience.

[7]  C. Matthews,et al.  A cis-prolyl peptide bond isomerization dominates the folding of the alpha subunit of Trp synthase, a TIM barrel protein. , 2002, Journal of molecular biology.

[8]  See-Kiong Ng,et al.  Systematic gene function prediction from gene expression data by using a fuzzy nearest-cluster method , 2006, BMC Bioinformatics.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[11]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[12]  Jiangning Song,et al.  Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information , 2006, BMC Bioinformatics.

[13]  Albert Y. Zomaya,et al.  Machine Learning Techniques for Protein Secondary Structure Prediction:An Overview and Evaluation , 2008 .

[14]  M. Kirschner,et al.  Sequence-specific and phosphorylation-dependent proline isomerization: a potential mitotic regulatory mechanism. , 1997, Science.

[15]  Andreas Martin,et al.  Prolyl isomerization as a molecular timer in phage infection , 2005, Nature Structural &Molecular Biology.

[16]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[17]  F. Schmid,et al.  Prolyl isomerase: enzymatic catalysis of slow protein-folding reactions. , 1993, Annual review of biophysics and biomolecular structure.

[18]  G Fischer,et al.  Side-chain effects on peptidyl-prolyl cis/trans isomerisation. , 1998, Journal of molecular biology.

[19]  Dimitrios I. Fotiadis,et al.  Prediction of cis/trans isomerization using feature selection and support vector machines , 2009, J. Biomed. Informatics.

[20]  Mark Gerstein,et al.  Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data , 2003, Bioinform..

[21]  M-L Wang,et al.  Support vector machines for prediction of peptidyl prolyl cis/trans isomerization. , 2008, The journal of peptide research : official journal of the American Peptide Society.

[22]  G Fischer,et al.  Regulation of peptide bond cis/trans isomerization by enzyme catalysis and its implication in physiological processes. , 2003, Reviews of physiology, biochemistry and pharmacology.

[23]  Dirk Labudde,et al.  COPS - Cis/trans peptide bond conformation prediction of amino acids on the basis of secondary structure information , 2005, Bioinform..

[24]  H. Scheraga,et al.  Proline cis-trans isomerization and protein folding. , 2002, Biochemistry.

[25]  Ravinder Singh,et al.  Fast-Find: A novel computational approach to analyzing combinatorial motifs , 2006, BMC Bioinformatics.

[26]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[27]  Milde M. S. Lira,et al.  Combining Multiple Artificial Neural Networks Using Random Committee to Decide upon Electrical Disturbance Classification , 2007, 2007 International Joint Conference on Neural Networks.

[28]  D Baker,et al.  Mechanisms of protein folding. , 2001, Current opinion in structural biology.

[29]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[30]  F. Schmid,et al.  Prolyl isomerases. , 2001, Advances in protein chemistry.

[31]  Xindong Wu,et al.  The Top Ten Algorithms in Data Mining , 2009 .

[32]  Albert Y. Zomaya,et al.  SiteSeek: Post-translational modification analysis using adaptive locality-effective kernel methods and new profiles , 2008, BMC Bioinformatics.

[33]  Albert Y. Zomaya,et al.  Hierarchical kernel mixture models for the prediction of AIDS disease progression using HIV structural gp120 profiles , 2010, BMC Genomics.