Comparison of different search engines using validated MS/MS test datasets

Massive amounts of tandem mass spectra are produced in high-throughput proteomics studies. The manual interpretation of these spectra is not feasible. Instead, search engines are used to match the tandem mass spectra with sequence information contained in proteomics and genomics databases. Typically, these search engines provide a list of the best matching peptide sequences for an individual tandem mass spectrum. As well, they provide scores that are somewhat related to the confidence level in the match. Many peptide tandem mass spectra search engines have been reported. These search engines provide very different results depending on the type of mass spectrometers used and their input parameters. Here we describe a comparative analysis of different search engines using validated test sets of tandem mass spectra. We have defined test sets of MS/MS spectra derived from high throughput proteomics experiments performed by HPLC-ESI-MS/MS on ion trap (LCQ) and tandem quadrupole time-of-flight instruments with a pulsar functionality (Qstar Pulsar) mass spectrometers. We analyzed the ability of the different search engines to identify the correct peptides, and the cross-validations of the different search engines.

[1]  M. K. Young,et al.  Method for screening peptide fragment ion mass spectra prior to database searching , 2000, Journal of the American Society for Mass Spectrometry.

[2]  John R Yates,et al.  Reproducibility of quantitative proteomic analyses of complex biological mixtures by multidimensional protein identification technology. , 2003, Analytical chemistry.

[3]  A. Shevchenko,et al.  MultiTag: multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry. , 2003, Analytical chemistry.

[4]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[5]  B. Chait,et al.  A statistical basis for testing the significance of mass spectrometric protein identification results. , 2000, Analytical chemistry.

[6]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[7]  A. Masselot,et al.  OLAV: Towards high‐throughput tandem mass spectrometry data identification , 2003, Proteomics.

[8]  A. Nesvizhskii,et al.  Experimental protein mixture for validating tandem mass spectral analysis. , 2002, Omics : a journal of integrative biology.

[9]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[10]  R. Appel,et al.  Popitam: Towards new heuristic strategies to improve protein identification from tandem mass spectrometry data , 2003, Proteomics.

[11]  Chris L. Tang,et al.  Efficiency of database search for identification of mutated and modified proteins via mass spectrometry. , 2001, Genome research.

[12]  R. Aebersold,et al.  ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data , 2002, Proteomics.

[13]  J. A. Taylor,et al.  Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. , 2001, Analytical chemistry.

[14]  David Fenyö,et al.  RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database , 2002, Proteomics.

[15]  William Stafford Noble,et al.  A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. , 2003, Journal of proteome research.

[16]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[17]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[18]  P. Roepstorff,et al.  Identification of proteins in polyacrylamide gels by mass spectrometric peptide mapping combined with database search. , 1994, Biological mass spectrometry.

[19]  J. Yates,et al.  GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. , 2003, Analytical chemistry.

[20]  Vineet Bafna,et al.  SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database , 2001, ISMB.

[21]  T. Köcher,et al.  Preprocessing of tandem mass spectrometric data to support automatic protein identification , 2003, Proteomics.

[22]  J. Yates,et al.  Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. , 1995, Analytical chemistry.