An efficient algorithmic approach for mass spectrometry-based disulfide connectivity determination using multi-ion analysis

BackgroundDetermining the disulfide (S-S) bond pattern in a protein is often crucial for understanding its structure and function. In recent research, mass spectrometry (MS) based analysis has been applied to this problem following protein digestion under both partial reduction and non-reduction conditions. However, this paradigm still awaits solutions to certain algorithmic problems fundamental amongst which is the efficient matching of an exponentially growing set of putative S-S bonded structural alternatives to the large amounts of experimental spectrometric data. Current methods circumvent this challenge primarily through simplifications, such as by assuming only the occurrence of certain ion-types (b-ions and y-ions) that predominate in the more popular dissociation methods, such as collision-induced dissociation (CID). Unfortunately, this can adversely impact the quality of results.MethodWe present an algorithmic approach to this problem that can, with high computational efficiency, analyze multiple ions types (a, b, bo, b*, c, x, y, yo, y*, and z) and deal with complex bonding topologies, such as inter/intra bonding involving more than two peptides. The proposed approach combines an approximation algorithm-based search formulation with data driven parameter estimation. This formulation considers only those regions of the search space where the correct solution resides with a high likelihood. Putative disulfide bonds thus obtained are finally combined in a globally consistent pattern to yield the overall disulfide bonding topology of the molecule. Additionally, each bond is associated with a confidence score, which aids in interpretation and assimilation of the results.ResultsThe method was tested on nine different eukaryotic Glycosyltransferases possessing disulfide bonding topologies of varying complexity. Its performance was found to be characterized by high efficiency (in terms of time and the fraction of search space considered), sensitivity, specificity, and accuracy. The method was also compared with other techniques at the state-of-the-art. It was found to perform as well or better than the competing techniques. An implementation is available at: http://tintin.sfsu.edu/~whemurad/disulfidebond.ConclusionsThis research addresses some of the significant challenges in MS-based disulfide bond determination. To the best of our knowledge, this is the first algorithmic work that can consider multiple ion types in this problem setting while simultaneously ensuring polynomial time complexity and high accuracy of results.

[1]  Ting Chen,et al.  Algorithms for identifying protein cross-links via tandem mass spectrometry , 2001, J. Comput. Biol..

[2]  Hua Xu,et al.  A mass accuracy sensitive probability based scoring algorithm for database searching of tandem mass spectrometry data , 2007, BMC Bioinformatics.

[3]  Ten-Yang Yen,et al.  Polynomial-time disulfide bond determination using mass spectrometry data , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop.

[4]  Peter Clote,et al.  DiANNA: a web server for disulfide connectivity prediction , 2005, Nucleic Acids Res..

[5]  Piero Fariselli,et al.  Prediction of disulfide connectivity in proteins , 2001, Bioinform..

[6]  Rahul Singh,et al.  Comparative Analysis of Disulfide Bond Determination Using Computational-Predictive Methods and Mass Spectrometry-Based Algorithmic Approach , 2008, BIRD.

[7]  Ten-Yang Yen,et al.  Determination of glycosylation sites and disulfide bond structures using LC/ESI-MS/MS analysis. , 2006, Methods in enzymology.

[8]  Pavel A. Pevzner,et al.  Peptide Sequence Tags for Fast Database Search in Mass-Spectrometry , 2005, RECOMB.

[9]  Alessio Ceroni,et al.  DISULFIND: a disulfide bonding state and cysteine connectivity prediction server , 2006, Nucleic Acids Res..

[10]  Alexey I Nesvizhskii,et al.  Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[11]  R S Johnson,et al.  Novel fragmentation process of peptides by collision-induced decomposition in a tandem mass spectrometer: differentiation of leucine and isoleucine. , 1987, Analytical chemistry.

[12]  Cheng-Yan Kao,et al.  Improving disulfide connectivity prediction with sequential distance between oxidized cysteines , 2005, Bioinform..

[13]  Clifford Stein,et al.  Introduction to algorithms. Chapter 16. 2nd Edition , 2001 .

[14]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[15]  Rahul Singh,et al.  A review of algorithmic techniques for disulfide-bond determination. , 2008, Briefings in functional genomics & proteomics.

[16]  Rahul Singh,et al.  MS2DB: A Mass-Based Hashing Algorithm for the Identification of Disulfide Linkage Patterns in Protein Utilizing Mass Spectrometric Data , 2007, Twentieth IEEE International Symposium on Computer-Based Medical Systems (CBMS'07).

[17]  Ten-Yang Yen,et al.  An algorithmic approach to automated high-throughput identification of disulfide connectivity in proteins using tandem mass spectrometry. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[18]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[19]  Ten-Yang Yen,et al.  Eukaryotic glycosyltransferases: cysteines and disulfides. , 2002, Glycobiology.

[20]  M. Mann,et al.  The abc's (and xyz's) of peptide sequencing , 2004, Nature Reviews Molecular Cell Biology.

[21]  Birgit Schilling,et al.  MS2Assign, automated assignment and nomenclature of tandem mass spectra of chemically crosslinked peptides , 2003, Journal of the American Society for Mass Spectrometry.

[22]  Ting Chen,et al.  Algorithms for Identifying Protein Cross-Links via Tandem Mass Spectrometry , 2001, J. Comput. Biol..

[23]  Michael A. Freitas,et al.  Identification and characterization of disulfide bonds in proteins and peptides from tandem MS data by use of the MassMatrix MS/MS search engine. , 2008, Journal of proteome research.

[24]  Harold N. Gabow,et al.  An Efficient Implementation of Edmonds' Algorithm for Maximum Matching on Graphs , 1976, JACM.