Prediction of disulfide bonding pattern based on a support vector machine and multiple trajectory search

To determine protein folding, accurately predicting the connectivity pattern of disulfide bridges can significantly reduce the search space, helping to solving the protein-folding problem. Therefore, developing an effective means of predicting disulfide connectivity patterns facilitates the estimation of the three-dimensional structure of a protein and its function. To our knowledge, with the prior knowledge of the bonding states of cysteines, the highest accuracy rate in the literature for predicting the overall disulfide connectivity pattern (Q"p) is 74.4% for dataset SP39. Dataset SP39 is conventionally adopted to predict disulfide connectivity. This work presents a novel classifier based on the support vector machine (SVM) that incorporates features of position-specific scoring matrix (PSSM), normalized bond lengths, the predicted secondary structure of protein, and indices for the physicochemical properties of amino acid. The support vector machine is trained to derive the connectivity probabilities of cysteine pairs. Additionally, an evolutionary algorithm called the multiple trajectory search (MTS) is integrated with the SVM model to tune the SVM parameters and window sizes for the above features. Moreover, the disulfide connectivity pattern is identified by using the maximum weight perfect matching algorithm. Experimental results indicate that the accuracy rate for predicting the overall disulfide connectivity pattern (Q"p) reaches 79.8% when tested using the same dataset SP39.

[1]  D. Haussler,et al.  Knowledge-based analysis of microarray gene expression , 2000 .

[2]  Jenn-Kang Hwang,et al.  Prediction of disulfide connectivity from protein sequences , 2005, Proteins.

[3]  Yanqing Zhang,et al.  Support vector machines with genetic fuzzy feature transformation for biomedical data classification , 2007, Inf. Sci..

[4]  Pierre Baldi,et al.  Large-Scale Prediction of Disulphide Bond Connectivity , 2004, NIPS.

[5]  Jens Meiler,et al.  Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks , 2001 .

[6]  Jenn-Kang Hwang,et al.  Prediction of the bonding states of cysteines Using the support vector machines based on multiple feature vectors and cysteine state sequences , 2004, Proteins.

[7]  P. Lyu,et al.  Relationship between protein structures and disulfide‐bonding patterns , 2003, Proteins.

[8]  Herman W T van Vlijmen,et al.  A novel database of disulfide patterns and its application to the discovery of distantly related homologs. , 2004, Journal of molecular biology.

[9]  Cheng-Yan Kao,et al.  Cysteine separations profiles on protein sequences infer disulfide connectivity , 2005, Bioinform..

[10]  Piero Fariselli,et al.  Prediction of the disulfide bonding state of cysteines in proteins with hidden neural networks. , 2002, Protein engineering.

[11]  Taeshik Shon,et al.  A hybrid machine learning approach to network anomaly detection , 2007, Inf. Sci..

[12]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[13]  J. Skolnick,et al.  MONSSTER: a method for folding globular proteins with a small number of distance restraints. , 1997, Journal of molecular biology.

[14]  Nikos Dimitropoulos,et al.  Mammographic masses characterization based on localized texture and dataset fractal analysis using linear, neural and support vector machine classifiers , 2006, Artif. Intell. Medicine.

[15]  Piero Fariselli,et al.  Prediction of disulfide connectivity in proteins , 2001, Bioinform..

[16]  Paolo Frasconi,et al.  Disulfide connectivity prediction using recursive neural networks and evolutionary information , 2004, Bioinform..

[17]  Chun Chen,et al.  Multiple trajectory search for Large Scale Global Optimization , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[18]  András Fiser,et al.  Predicting disulfide bond connectivity in proteins by correlated mutations analysis , 2008, Bioinform..

[19]  Cheng-Yan Kao,et al.  Improving disulfide connectivity prediction with sequential distance between oxidized cysteines , 2005, Bioinform..

[20]  Cheng-Yan Kao,et al.  Disulfide connectivity prediction with 70% accuracy using two‐level models , 2006, Proteins.

[21]  E. Huang,et al.  Ab initio fold prediction of small helical proteins using distance geometry and knowledge-based scoring functions. , 1999, Journal of molecular biology.

[22]  Peng Jiang,et al.  RISP: A web-based server for prediction of RNA-binding sites in proteins , 2008, Comput. Methods Programs Biomed..

[23]  Jung Liu,et al.  Shape-based image retrieval using support vector machines, Fourier descriptors and self-organizing maps , 2007, Inf. Sci..

[24]  Jiangning Song,et al.  Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure , 2007, Bioinform..

[25]  P Fariselli,et al.  Role of evolutionary information in predicting the disulfide‐bonding state of cysteine in proteins , 1999, Proteins.

[26]  Bhaskar D. Kulkarni,et al.  A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli , 2006, Bioinform..

[27]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[28]  Peter M. Kasson,et al.  A hybrid machine-learning approach for segmentation of protein localization data , 2005, Bioinform..

[29]  András Fiser,et al.  Predicting the oxidation state of cysteines by multiple sequence alignment , 2000, Bioinform..

[30]  Jenn-Kang Hwang,et al.  Predicting disulfide connectivity patterns , 2007, Proteins.

[31]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[32]  Harold Neil Gabow,et al.  Implementation of algorithms for maximum matching on nonbipartite graphs , 1973 .

[33]  Peter Clote,et al.  Disulfide connectivity prediction using secondary structure information and diresidue frequencies , 2005, Bioinform..