Probabilistic consensus scoring improves tandem mass spectrometry peptide identification.

Database search is a standard technique for identifying peptides from their tandem mass spectra. To increase the number of correctly identified peptides, we suggest a probabilistic framework that allows the combination of scores from different search engines into a joint consensus score. Central to the approach is a novel method to estimate scores for peptides not found by an individual search engine. This approach allows the estimation of p-values for each candidate peptide and their combination across all search engines. The consensus approach works better than any single search engine across all different instrument types considered in this study. Improvements vary strongly from platform to platform and from search engine to search engine. Compared to the industry standard MASCOT, our approach can identify up to 60% more peptides. The software for consensus predictions is implemented in C++ as part of OpenMS, a software framework for mass spectrometry. The source code is available in the current development version of OpenMS and can easily be used as a command line application or via a graphical pipeline designer TOPPAS.

[1]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[3]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[4]  A. Burlingame,et al.  Functional Assignment of the 20 S Proteasome from Trypanosoma brucei Using Mass Spectrometry and New Bioinformatics Approaches* , 2001, The Journal of Biological Chemistry.

[5]  P. Bork,et al.  Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching. , 2001, Analytical chemistry.

[6]  J. Yates,et al.  An automated multidimensional protein identification technology for shotgun proteomics. , 2001, Analytical chemistry.

[7]  A. Nesvizhskii,et al.  Experimental protein mixture for validating tandem mass spectral analysis. , 2002, Omics : a journal of integrative biology.

[8]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[9]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[10]  A. Masselot,et al.  OLAV: Towards high‐throughput tandem mass spectrometry data identification , 2003, Proteomics.

[11]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[12]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[13]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[14]  M. Mann,et al.  The abc's (and xyz's) of peptide sequencing , 2004, Nature Reviews Molecular Cell Biology.

[15]  F. Leisch FlexMix: A general framework for finite mixture models and latent class regression in R , 2004 .

[16]  A. Makarov,et al.  The Orbitrap: a new mass spectrometer. , 2005, Journal of mass spectrometry : JMS.

[17]  Gilbert S Omenn,et al.  An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: Sensitivity and specificity analysis , 2005, Proteomics.

[18]  M. Mann,et al.  Parts per Million Mass Accuracy on an Orbitrap Mass Spectrometer via Lock Mass Injection into a C-trap*S , 2005, Molecular & Cellular Proteomics.

[19]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[20]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[21]  Lennart Martens,et al.  PRIDE: a public repository of protein and peptide identifications for the proteomics community , 2005, Nucleic Acids Res..

[22]  Monica Riley,et al.  Escherichia coli K-12: a cooperatively developed annotation snapshot—2005 , 2006, Nucleic acids research.

[23]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[24]  Roy D. Welch,et al.  Complete genome sequence of the myxobacterium Sorangium cellulosum , 2007, Nature Biotechnology.

[25]  Nuno Bandeira,et al.  Spectral networks: a new approach to de novo discovery of protein sequences and posttranslational modifications. , 2007, BioTechniques.

[26]  Knut Reinert,et al.  TOPP - the OpenMS proteomics pipeline , 2007, Bioinform..

[27]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[28]  Alexey I Nesvizhskii,et al.  Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[29]  N. Edwards,et al.  Novel peptide identification from tandem mass spectra using ESTs and sequence database compression , 2007, Molecular systems biology.

[30]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[31]  B. Searle,et al.  Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies. , 2008, Journal of proteome research.

[32]  William Stafford Noble,et al.  Posterior error probabilities and false discovery rates: two sides of the same coin. , 2008, Journal of proteome research.

[33]  Ruedi Aebersold,et al.  The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools. , 2008, Journal of proteome research.

[34]  Hyungwon Choi,et al.  Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics. , 2008, Journal of Proteome Research.

[35]  Matthias Mann,et al.  Bioinformatics analysis of mass spectrometry‐based proteomics data sets , 2009, FEBS letters.

[36]  Xue Wu,et al.  An Unsupervised, Model-Free, Machine-Learning Combiner for Peptide Identifications from Tandem Mass Spectra , 2009, Clinical Proteomics.

[37]  Oliver Kohlbacher,et al.  De novo peptide sequencing by tandem MS using complementary CID and electron transfer dissociation , 2009, Electrophoresis.

[38]  Ravi Tharakan,et al.  Data maximization by multipass analysis of protein mass spectra , 2010, Proteomics.