Disulfide connectivity prediction based on structural information without a prior knowledge of the bonding state of cysteines

Previous studies predicted the disulfide bonding patterns of cysteines using a prior knowledge of their bonding states. In this study, we propose a method that is based on the ensemble support vector machine (SVM), with the structural features of cysteines extracted without any prior knowledge of their bonding states. This method is useful for improving the predictive performance of disulfide bonding patterns. For comparison, the proposed method was tested with the same dataset SPX that was adopted in previous studies. The experimental results demonstrate that bridge classification and disulfide connectivity predictions achieve 96.5% and 89.2% accuracy, respectively, using the ensemble SVM model, which outperforms the traditional method (51.5% and 51.0%, respectively) and the model that is based on a single-kernel SVM classifier (94.6% and 84.4%, respectively). For protein chain and residue classifications, the sensitivity, specificity, and accuracy of ensemble and single-kernel SVM approaches are better than those of the traditional methods. The predictive performances of the ensemble SVM and single-kernel models are identical, indicating that the ensemble model can converge to the single-kernel model for some applications.

[1]  Lin-Yu Tseng,et al.  DBCP: a web server for disulfide bonding connectivity pattern prediction without the prior knowledge of the bonding state of cysteines , 2010, Nucleic Acids Res..

[2]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[3]  András Fiser,et al.  Predicting the oxidation state of cysteines by multiple sequence alignment , 2000, Bioinform..

[4]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[5]  Pierre Baldi,et al.  Large‐scale prediction of disulphide bridges using kernel methods, two‐dimensional recursive neural networks, and weighted graph matching , 2005, Proteins.

[6]  Cheng-Yan Kao,et al.  Cysteine separations profiles on protein sequences infer disulfide connectivity , 2005, Bioinform..

[7]  Piero Fariselli,et al.  Prediction of the disulfide bonding state of cysteines in proteins with hidden neural networks. , 2002, Protein engineering.

[8]  Lin-Yu Tseng,et al.  Prediction of disulfide bonding pattern based on a support vector machine and multiple trajectory search , 2012, Inf. Sci..

[9]  Jenn-Kang Hwang,et al.  Prediction of disulfide connectivity from protein sequences , 2005, Proteins.

[10]  Burkhard Rost,et al.  UniqueProt: creating representative protein sequence sets , 2003, Nucleic Acids Res..

[11]  Piero Fariselli,et al.  Prediction of disulfide connectivity in proteins , 2001, Bioinform..

[12]  Yu-Dong Cai,et al.  Inter- and intra-chain disulfide bond prediction based on optimal feature selection. , 2013, Protein and peptide letters.

[13]  Cheng-Yan Kao,et al.  Disulfide connectivity prediction with 70% accuracy using two‐level models , 2006, Proteins.

[14]  Jenn-Kang Hwang,et al.  Predicting disulfide connectivity patterns , 2007, Proteins.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Harold Neil Gabow,et al.  Implementation of algorithms for maximum matching on nonbipartite graphs , 1973 .

[17]  Peter Clote,et al.  Disulfide connectivity prediction using secondary structure information and diresidue frequencies , 2005, Bioinform..

[18]  P E Bourne,et al.  The Protein Data Bank. , 2002, Nucleic acids research.

[19]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[20]  Cheng-Yan Kao,et al.  Improving disulfide connectivity prediction with sequential distance between oxidized cysteines , 2005, Bioinform..

[21]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[22]  Yi Li,et al.  Constructing support vector machine ensemble with segmentation for imbalanced datasets , 2012, Neural Computing and Applications.

[23]  E. Huang,et al.  Ab initio fold prediction of small helical proteins using distance geometry and knowledge-based scoring functions. , 1999, Journal of molecular biology.

[24]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[25]  J. Skolnick,et al.  MONSSTER: a method for folding globular proteins with a small number of distance restraints. , 1997, Journal of molecular biology.

[26]  P Fariselli,et al.  Role of evolutionary information in predicting the disulfide‐bonding state of cysteine in proteins , 1999, Proteins.

[27]  Chun Chen,et al.  Multiple trajectory search for multiobjective optimization , 2007, 2007 IEEE Congress on Evolutionary Computation.

[28]  Jiangning Song,et al.  Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure , 2007, Bioinform..

[29]  Stefano M. Marino,et al.  Gladyshev Reactive Cysteine Residues Analysis and Functional Prediction of Minireviews : , 2012 .

[30]  Fang Liu,et al.  Selective multiple kernel learning for classification with ensemble strategy , 2013, Pattern Recognit..

[31]  Paolo Frasconi,et al.  Disulfide connectivity prediction using recursive neural networks and evolutionary information , 2004, Bioinform..

[32]  Chun Chen,et al.  Multiple trajectory search for Large Scale Global Optimization , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[33]  Jenn-Kang Hwang,et al.  Prediction of the bonding states of cysteines Using the support vector machines based on multiple feature vectors and cysteine state sequences , 2004, Proteins.