Multi-spectra peptide sequencing and its applications to multistage mass spectrometry

Despite a recent surge of interest in database-independent peptide identifications, accurate de novo peptide sequencing remains an elusive goal. While the recently introduced spectral network approach resulted in accurate peptide sequencing in low-complexity samples, its success depends on the chance of presence of spectra from overlapping peptides. On the other hand, while multistage mass spectrometry (collecting multiple MS 3 spectra from each MS 2 spectrum) can be applied to all spectra in a complex sample, there are currently no software tools for de novo peptide sequencing by multistage mass spectrometry. We describe a rigorous probabilistic framework for analyzing spectra of overlapping peptides and show how to apply it for multistage mass spectrometry. Our software results in both accurate de novo peptide sequencing from multistage mass spectra (despite the inferior quality of MS 3 spectra) and improved interpretation of spectral networks. We further study the problem of de novo peptide sequencing with accurate parent mass (but inaccurate fragment masses), the protocol that may soon become the dominant mode of spectral acquisition. Most existing peptide sequencing algorithms (based on the spectrum graph approach) do not track the accurate parent mass and are thus not equipped for solving this problem. We describe a de novo peptide sequencing algorithm aimed at this experimental protocol and show that it improves the sequencing accuracy on both tandem and multistage mass spectrometry. Availability: The open-source implementation of our software is available at http://proteomics.bioprojects.org. Contact: bandeira@ucsd.edu Supplementary information:: Supplementary data are available at Bioinformatics online.

[1]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[2]  G. Glish,et al.  C-terminal peptide sequencing via multistage mass spectrometry. , 1998, Analytical chemistry.

[3]  N. Maizels Immunoglobulin gene diversification. , 2005, Annual review of genetics.

[4]  A. Shevchenko,et al.  Rapid 'de novo' peptide sequencing by a combination of nanoelectrospray, isotopic labeling and a quadrupole/time-of-flight mass spectrometer. , 1997, Rapid communications in mass spectrometry : RCM.

[5]  Ming-Yang Kao,et al.  A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry , 2000, SODA '00.

[6]  Vineet Bafna,et al.  On de novo interpretation of tandem mass spectra for peptide identification , 2003, RECOMB '03.

[7]  M. Mann,et al.  Improved peptide identification in proteomics by two consecutive stages of mass spectrometric fragmentation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Yi-Kuo Yu,et al.  Robust accurate identification of peptides (RAId): deciphering MS2 data using a structured library search with de novo based statistics , 2005, Bioinform..

[9]  John S Haurum,et al.  Recombinant polyclonal antibodies: the next generation of antibody therapeutics? , 2006, Drug discovery today.

[10]  Pavel A. Pevzner,et al.  Protein identification by spectral networks analysis , 2007, Proceedings of the National Academy of Sciences.

[11]  J. Yates,et al.  Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. , 1995, Analytical chemistry.

[12]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[13]  R. Guigó,et al.  Improving gene annotation using peptide mass spectrometry. , 2007, Genome research.

[14]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[15]  Z. Zhang,et al.  De novo peptide sequencing by two-dimensional fragment correlation mass spectrometry. , 2000, Analytical chemistry.

[16]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[17]  Jorge Fernandez-de-Cossío,et al.  A computer program to aid the sequencing of peptides in collision- activated decomposition experiments , 1995, Comput. Appl. Biosci..

[18]  M. Lavin,et al.  The Diversity of Bioactive Proteins in Australian Snake Venoms*S , 2007, Molecular & Cellular Proteomics.

[19]  Joachim M. Buhmann,et al.  A Hidden Markov Model for de Novo Peptide Sequencing , 2004, NIPS.

[20]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[21]  Dekel Tsur,et al.  Identification of post-translational modifications by blind search of mass spectra , 2005, Nature Biotechnology.

[22]  N. Edwards,et al.  Novel peptide identification from tandem mass spectra using ESTs and sequence database compression , 2007, Molecular systems biology.

[23]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[24]  C. Bartels Fast algorithm for peptide sequencing by mass spectroscopy. , 1990, Biomedical & environmental mass spectrometry.

[25]  Adriano M C Pimenta,et al.  Small peptides, big world: biotechnological potential in neglected bioactive peptides from arthropod venoms , 2005, Journal of peptide science : an official publication of the European Peptide Society.

[26]  Charles Buck,et al.  Performance evaluation of existing de novo sequencing algorithms. , 2006, Journal of proteome research.

[27]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[28]  Gholson J Lyon,et al.  Detection of secreted peptides by using hypothesis-driven multistage mass spectrometry , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Richard D. Smith,et al.  Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation. , 2007, Genome research.

[30]  R. Aebersold,et al.  Investigating MS2/MS3 Matching Statistics , 2008, Molecular & Cellular Proteomics.

[31]  Jennie R Lill,et al.  De novo proteomic sequencing of a monoclonal antibody raised against OX40 ligand. , 2006, Analytical biochemistry.

[32]  P. Pevzner,et al.  Shotgun protein sequencing by tandem mass spectra assembly. , 2004, Analytical chemistry.

[33]  Debojyoti Dutta,et al.  MSNovo: a dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry. , 2007, Analytical chemistry.

[34]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[35]  Nuno Bandeira,et al.  Shotgun Protein Sequencing : Assembly of Tandem Mass Spectra from Mixtures of Modified Proteins , 2007 .

[36]  Nuno Bandeira,et al.  Shotgun Protein Sequencing , 2007, Molecular & Cellular Proteomics.