Improving disulfide connectivity prediction with sequential distance between oxidized cysteines

SUMMARY Predicting disulfide connectivity precisely helps towards the solution of protein structure prediction. In this study, a descriptor derived from the sequential distance between oxidized cysteines (denoted as DOC) is proposed. An approach using support vector machine (SVM) method based on weighted graph matching was further developed to predict the disulfide connectivity pattern in proteins. When DOC was applied, prediction accuracy of 63% for our SVM models could be achieved, which is significantly higher than those obtained from previous approaches. The results show that using the non-local descriptor DOC coupled with local sequence profiles significantly improves the prediction accuracy. These improvements demonstrate that DOC, with a proper scaling scheme, is an effective feature for the prediction of disulfide connectivity. The method developed in this work is available at the web server PreCys (prediction of cys-cys linkages of proteins).

[1]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[2]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[3]  Piero Fariselli,et al.  Prediction of disulfide connectivity in proteins , 2001, Bioinform..

[4]  C. C. Chang,et al.  Libsvm : introduction and benchmarks , 2000 .

[5]  M. Sternberg,et al.  Analysis and classification of disulphide connectivity in proteins. The entropic effect of cross-linkage. , 1994, Journal of molecular biology.

[6]  E. Huang,et al.  Ab initio fold prediction of small helical proteins using distance geometry and knowledge-based scoring functions. , 1999, Journal of molecular biology.

[7]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[8]  Harold Neil Gabow,et al.  Implementation of algorithms for maximum matching on nonbipartite graphs , 1973 .

[9]  Peter Clote,et al.  Disulfide connectivity prediction using secondary structure information and diresidue frequencies , 2005, Bioinform..

[10]  H. Scheraga,et al.  Disulfide bonds and protein folding. , 2000, Biochemistry.

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  Cheng-Yan Kao,et al.  Cysteine separations profiles on protein sequences infer disulfide connectivity , 2005, Bioinform..

[13]  Paolo Frasconi,et al.  Disulfide connectivity prediction using recursive neural networks and evolutionary information , 2004, Bioinform..

[14]  Pierre Baldi,et al.  Large-Scale Prediction of Disulphide Bond Connectivity , 2004, NIPS.

[15]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..