Robust accurate identification of peptides (RAId): deciphering MS2 data using a structured library search with de novo based statistics

MOTIVATION The key to MS -based proteomics is peptide sequencing. The major challenge in peptide sequencing, whether library search or de novo, is to better infer statistical significance and better attain noise reduction. Since the noise in a spectrum depends on experimental conditions, the instrument used and many other factors, it cannot be predicted even if the peptide sequence is known. The characteristics of the noise can only be uncovered once a spectrum is given. We wish to overcome such issues. RESULTS We designed RAId to identify peptides from their associated tandem mass spectrometry data. RAId performs a novel de novo sequencing followed by a search in a peptide library that we created. Through de novo sequencing, we establish the spectrum-specific background score statistics for the library search. When the database search fails to return significant hits, the top-ranking de novo sequences become potential candidates for new peptides that are not yet in the database. The use of spectrum-specific background statistics seems to enable RAId to perform well even when the spectral quality is marginal. Other important features of RAId include its potential in de novo sequencing alone and the ease of incorporating post-translational modifications.

[1]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[2]  P. Bork,et al.  Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching. , 2001, Analytical chemistry.

[3]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[4]  K. Biemann Appendix 5. Nomenclature for peptide fragment ions (positive ions). , 1990, Methods in enzymology.

[5]  Ming-Yang Kao,et al.  A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry , 2000, SODA '00.

[6]  J. A. Taylor,et al.  Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. , 2001, Analytical chemistry.

[7]  William Stafford Noble,et al.  A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. , 2003, Journal of proteome research.

[8]  F. McLafferty,et al.  Fourier-transform mass spectrometry of large molecules by electrospray ionization. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[9]  F. McLafferty,et al.  Tandem Fourier Transform Mass Spectrometry of Large Molecules , 1987 .

[10]  A. Marshall,et al.  Fourier Transform Ion Cyclotron Resonance Spectroscopy , 1974 .

[11]  J. Shabanowitz,et al.  Subfemtomole MS and MS/MS peptide sequence analysis using nano-HPLC micro-ESI fourier transform ion cyclotron resonance mass spectrometry. , 2000, Analytical chemistry.

[12]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[13]  Fourier transform ion cyclotron resonance spectroscopy , 2007 .

[14]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[15]  M. Mann,et al.  Analysis of proteins and proteomes by mass spectrometry. , 2001, Annual review of biochemistry.

[16]  Martin Kussmann,et al.  Matrix‐assisted Laser Desorption/Ionization Mass Spectrometry Sample Preparation Techniques Designed for Various Peptide and Protein Analytes , 1997 .

[17]  Joshua E. Elias,et al.  Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. , 2003, Journal of proteome research.

[18]  Mikhail S. Gelfand,et al.  Pro-Frame: similarity-based gene recognition in eukaryotic DNA sequences with errors , 2001, Bioinform..

[19]  M. Karas,et al.  Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. , 1988, Analytical chemistry.

[20]  R. Appel,et al.  Popitam: Towards new heuristic strategies to improve protein identification from tandem mass spectrometry data , 2003, Proteomics.

[21]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[22]  Yi-Kuo Yu,et al.  Ranked solutions to a class of combinatorial optimizations - with applications in mass spectrometry based peptide sequencing , 2005 .

[23]  J. A. Taylor,et al.  Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. , 1997, Rapid communications in mass spectrometry : RCM.

[24]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[25]  E. Board,et al.  LVIII. On the Masses of the Ions in Gases at Low Pressures , 1997 .

[26]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[27]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[28]  M. Mann,et al.  Improved peptide identification in proteomics by two consecutive stages of mass spectrometric fragmentation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[29]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[30]  Vineet Bafna,et al.  SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database , 2001, ISMB.

[31]  M. Mann,et al.  Electrospray ionization for mass spectrometry of large biomolecules. , 1989, Science.

[32]  M. Mann,et al.  Electrospray Ionization for Mass Spectrometry of Large Biomolecules , 1990 .

[33]  J. Yates,et al.  Probability-based validation of protein identifications using a modified SEQUEST algorithm. , 2002, Analytical chemistry.

[34]  J. Yates Mass spectrometry and the age of the proteome. , 1998, Journal of mass spectrometry : JMS.