Spectral Profiles, a Novel Representation of Tandem Mass Spectra and Their Applications for De Novo Peptide Sequencing and Identification* □ S

Despite many efforts in the last decade, the progress in de novo peptide sequencing has been slow with only 30–45% of all peptides correctly reconstructed. We argue that accurate full-length peptide sequencing may be an unattainable goal for some spectra and demonstrate how to accurately sequence gapped peptides instead. We further argue that gapped peptides are nearly as useful as full-length peptides for error-tolerant database searches. Gapped peptides occupy a niche between long but inaccurate full-length reconstructions and short but accurate peptide sequence tags. Our MS-Profile tool uses spectral profiles, a new representation of tandem mass spectra, to generate gapped peptides that are longer and more accurate than peptide sequence tags of length 3 traditionally used to speed up database searches in proteomics. In addition, spectral profiles also enable intuitive visualization of all high scoring de novo reconstructions of tandem mass spectra.

[1]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[2]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[3]  J. Yates,et al.  GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. , 2003, Analytical chemistry.

[4]  David Goldberg,et al.  Lookup peaks: a hybrid of de novo sequencing and database search for protein identification by tandem mass spectrometry. , 2007, Analytical chemistry.

[5]  A. Shevchenko,et al.  MultiTag: multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry. , 2003, Analytical chemistry.

[6]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[7]  Pavel A. Pevzner,et al.  Peptide Sequence Tags for Fast Database Search in Mass-Spectrometry , 2005, RECOMB.

[8]  Dekel Tsur,et al.  Identification of post-translational modifications by blind search of mass spectra , 2005, Nature Biotechnology.

[9]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[10]  Sean L Seymour,et al.  The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass Spectra*S , 2007, Molecular & Cellular Proteomics.

[11]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[12]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[13]  P. Bork,et al.  Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching. , 2001, Analytical chemistry.

[14]  Bin Ma,et al.  SPIDER: software for protein identification from sequence tags with de novo sequencing error , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[15]  Ruedi Aebersold,et al.  The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools. , 2008, Journal of proteome research.

[16]  Ari M Frank,et al.  A ranking-based scoring function for peptide-spectrum matches. , 2009, Journal of proteome research.

[17]  Bin Ma,et al.  SPIDER: software for protein identification from sequence tags with de novo sequencing error. , 2004, Proceedings. IEEE Computational Systems Bioinformatics Conference.

[18]  J. A. Taylor,et al.  Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. , 2001, Analytical chemistry.

[19]  J. A. Taylor,et al.  Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. , 1997, Rapid communications in mass spectrometry : RCM.

[20]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[21]  Nuno Bandeira,et al.  Multi-spectra peptide sequencing and its applications to multistage mass spectrometry , 2008, ISMB.

[22]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[23]  Vincent J. Denef,et al.  Implications of strain- and species-level sequence divergence for community and isolate shotgun proteomic analysis. , 2007, Journal of proteome research.

[24]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[25]  Yi-Kuo Yu,et al.  Robust accurate identification of peptides (RAId): deciphering MS2 data using a structured library search with de novo based statistics , 2005, Bioinform..

[26]  P. Pevzner,et al.  Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. , 2008, Journal of proteome research.

[27]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[28]  B. Searle,et al.  High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de novo sequencing results. , 2004, Analytical chemistry.

[29]  C. Bartels Fast algorithm for peptide sequencing by mass spectroscopy. , 1990, Biomedical & environmental mass spectrometry.

[30]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.