Protein disulfide topology determination through the fusion of mass spectrometric analysis and sequence-based prediction using Dempster-Shafer theory

BackgroundDisulfide bonds constitute one of the most important cross-linkages in proteins and significantly influence protein structure and function. At the state-of-the-art, various methodological frameworks have been proposed for identification of disulfide bonds. These include among others, mass spectrometry-based methods, sequence-based predictive approaches, as well as techniques like crystallography and NMR. Each of these frameworks has its advantages and disadvantages in terms of pre-requisites for applicability, throughput, and accuracy. Furthermore, the results from different methods may concur or conflict in parts.ResultsIn this paper, we propose a novel and theoretically rigorous framework for disulfide bond determination based on information fusion from different methods using an extended formulation of Dempster-Shafer theory. A key advantage of our approach is that it can automatically deal with concurring as well as conflicting evidence in a data-driven manner. Using the proposed framework, we have developed a method for disulfide bond determination that combines results from sequence-based prediction and mass spectrometric inference. This method leads to more accurate disulfide bond determination than any of the constituent methods taken individually. Furthermore, experiments indicate that the method improves the accuracy of bond identification as compared to leading extant methods at the state-of-the-art. Finally, the proposed framework is extensible in that results from any number of approaches can be incorporated. Results obtained using this framework can especially be useful in cases where the complexity of the bonding patterns coupled with specificities of the fragmentation pattern or limitations of computational models impair any single method to perform consistently across a diverse set of molecules.

[1]  Ten-Yang Yen,et al.  Eukaryotic glycosyltransferases: cysteines and disulfides. , 2002, Glycobiology.

[2]  Rahul Singh,et al.  A review of algorithmic techniques for disulfide-bond determination. , 2008, Briefings in functional genomics & proteomics.

[3]  Pierre Baldi,et al.  Large‐scale prediction of disulphide bridges using kernel methods, two‐dimensional recursive neural networks, and weighted graph matching , 2005, Proteins.

[4]  Jenn-Kang Hwang,et al.  Prediction of disulfide connectivity from protein sequences , 2005, Proteins.

[5]  Birgit Schilling,et al.  MS2Assign, automated assignment and nomenclature of tandem mass spectra of chemically crosslinked peptides , 2003, Journal of the American Society for Mass Spectrometry.

[6]  Cheng-Yan Kao,et al.  Bioinformatics approaches for disulfide connectivity prediction. , 2007, Current protein & peptide science.

[7]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[8]  Vladimir Brusic,et al.  CysView: protein classification based on cysteine pairing patterns , 2004, Nucleic Acids Res..

[9]  Rahul Singh,et al.  Comparative Analysis of Disulfide Bond Determination Using Computational-Predictive Methods and Mass Spectrometry-Based Algorithmic Approach , 2008, BIRD.

[10]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[11]  P Tufféry,et al.  Predicting the disulfide bonding state of cysteines using protein descriptors , 2002, Proteins.

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Piero Fariselli,et al.  Improving the prediction of disulfide bonds in Eukaryotes with machine learning methods and protein subcellular localization , 2011, Bioinform..

[14]  Jenn-Kang Hwang,et al.  Prediction of the bonding states of cysteines Using the support vector machines based on multiple feature vectors and cysteine state sequences , 2004, Proteins.

[15]  Paolo Frasconi,et al.  Disulfide connectivity prediction using recursive neural networks and evolutionary information , 2004, Bioinform..

[16]  Cheng-Yan Kao,et al.  Cysteine separations profiles on protein sequences infer disulfide connectivity , 2005, Bioinform..

[17]  Jon Beckwith,et al.  Protein disulfide bond formation in prokaryotes. , 2003, Annual review of biochemistry.

[18]  Michael A. Freitas,et al.  Identification and characterization of disulfide bonds in proteins and peptides from tandem MS data by use of the MassMatrix MS/MS search engine. , 2008, Journal of proteome research.

[19]  Sérgio Cavalcante,et al.  An extended approach for Dempster-Shafer theory , 2003, Proceedings Fifth IEEE Workshop on Mobile Computing Systems and Applications.

[20]  Alessio Ceroni,et al.  DISULFIND: a disulfide bonding state and cysteine connectivity prediction server , 2006, Nucleic Acids Res..

[21]  Olga Vitek,et al.  Getting Started in Computational Mass Spectrometry–Based Proteomics , 2009, PLoS Comput. Biol..

[22]  Piero Fariselli,et al.  Prediction of disulfide connectivity in proteins , 2001, Bioinform..

[23]  Steven M. Muskal,et al.  Prediction of the disulfide-bonding state of cysteine in proteins. , 1990, Protein engineering.

[24]  Rahul Singh,et al.  An efficient algorithmic approach for mass spectrometry-based disulfide connectivity determination using multi-ion analysis , 2011, BMC Bioinformatics.

[25]  Peter Clote,et al.  DiANNA: a web server for disulfide connectivity prediction , 2005, Nucleic Acids Res..

[26]  Kari Sentz,et al.  Combination of Evidence in Dempster-Shafer Theory , 2002 .

[27]  Ten-Yang Yen,et al.  Determination of glycosylation sites and disulfide bond structures using LC/ESI-MS/MS analysis. , 2006, Methods in enzymology.

[28]  Cheng-Yan Kao,et al.  Disulfide connectivity prediction with 70% accuracy using two‐level models , 2006, Proteins.

[29]  R. Yager On the dempster-shafer framework and new combination rules , 1987, Inf. Sci..

[30]  B. Matthews,et al.  Substantial increase of protein stability by multiple disulphide bonds , 1989, Nature.

[31]  Eunok Paek,et al.  New algorithm for the identification of intact disulfide linkages based on fragmentation characteristics in tandem mass spectra. , 2010, Journal of proteome research.

[32]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[33]  P Fariselli,et al.  Role of evolutionary information in predicting the disulfide‐bonding state of cysteine in proteins , 1999, Proteins.

[34]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[35]  András Fiser,et al.  Predicting the oxidation state of cysteines by multiple sequence alignment , 2000, Bioinform..