Combining high resolution and exact calibration to boost statistical power: A well-calibrated score function for high-resolution MS2 data

To achieve accurate assignment of peptide sequences to observed fragmentation spectra, a shotgun proteomics database search tool must make good use of the very high resolution information produced by state-of-the-art mass spectrometers. However, making use of this information while also ensuring that the search engine’s scores are well calibrated—i.e., that the score assigned to one spectrum can be meaningfully compared to the score assigned to a different spectrum—has proven to be challenging. Here, we describe a database search score function, the “residue evidence” (res-ev) score, that achieves both of these goals simultaneously. We also demonstrate how to combine calibrated res-ev scores with calibrated XCorr scores to produce a “combined p-value” score function. We provide a benchmark consisting of four mass spectrometry data sets, which we use to compare the combined p-value to the score functions used by several existing search engines. Our results suggest that the combined p-value achieves state-of-the-art performance, generally outperforming MS Amanda and Morpheus and performing comparably to MS-GF+. The res-ev and combined p-value score functions are freely available as part of the Tide search engine in the Crux mass spectrometry toolkit (http://crux.ms).

[1]  William Stafford Noble,et al.  Bipartite matching generalizations for peptide identification in tandem mass spectrometry , 2016, BCB.

[2]  Aleksey Y. Ogurtsov,et al.  RAId_aPS: MS/MS Analysis with Multiple Scoring Functions and Spectrum-Specific Statistics , 2008, PloS one.

[3]  Eystein Oveland,et al.  PeptideShaker enables reanalysis of MS-derived proteomics data sets , 2015, Nature Biotechnology.

[4]  William Stafford Noble,et al.  On the Importance of Well-Calibrated Scores for Identifying Shotgun Proteomics Spectra , 2014, Journal of proteome research.

[5]  Stephan M. Winkler,et al.  MS Amanda, a Universal Identification Algorithm Optimized for High Accuracy Tandem Mass Spectra , 2014, Journal of proteome research.

[6]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[7]  Andrew R Jones,et al.  FDRAnalysis: a tool for the integrated analysis of tandem mass spectrometry identification results from multiple search engines. , 2011, Journal of proteome research.

[8]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[9]  Bernhard Y. Renard,et al.  Evaluating de novo sequencing in proteomics: already an accurate alternative to database‐driven peptide identification? , 2018, Briefings Bioinform..

[10]  O. Kohlbacher,et al.  Probabilistic consensus scoring improves tandem mass spectrometry peptide identification. , 2011, Journal of proteome research.

[11]  William Noble Grundy,et al.  Classifying proteins by family using the product of correlated p-values , 1999, RECOMB.

[12]  William Stafford Noble,et al.  Faster SEQUEST searching for peptide identification from tandem mass spectra. , 2011, Journal of proteome research.

[13]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[14]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[15]  Lev I Levitsky,et al.  Unbiased False Discovery Rate Estimation for Shotgun Proteomics Based on the Target-Decoy Approach. , 2017, Journal of proteome research.

[16]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[17]  D. Daley,et al.  Identification of Putative Substrates for the Periplasmic Chaperone YfgM in Escherichia coli Using Quantitative Proteomics* , 2014, Molecular & Cellular Proteomics.

[18]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[19]  Edward L Huttlin,et al.  Global analysis of protein expression and phosphorylation of three stages of Plasmodium falciparum intraerythrocytic development. , 2013, Journal of proteome research.

[20]  William Stafford Noble,et al.  Crux: Rapid Open Source Protein Tandem Mass Spectrometry Analysis , 2014, Journal of proteome research.

[21]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[22]  P. Pevzner,et al.  Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. , 2008, Journal of proteome research.

[23]  Hyungwon Choi,et al.  MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines. , 2011, Journal of proteome research.

[24]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[25]  William Stafford Noble,et al.  Computing Exact p-values for a Cross-correlation Shotgun Proteomics Score Function , 2014, Molecular & Cellular Proteomics.

[26]  J. Coon,et al.  A proteomics search algorithm specifically designed for high-resolution tandem mass spectra. , 2013, Journal of proteome research.

[27]  William Stafford Noble,et al.  Rapid and accurate peptide identification from tandem mass spectra. , 2008, Journal of proteome research.

[28]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[29]  Tamanna Sultana,et al.  Optimization of the Use of Consensus Methods for the Detection and Putative Identification of Peptides via Mass Spectrometry Using Protein Standard Mixtures. , 2009, Journal of proteomics & bioinformatics.

[30]  Yi-Kuo Yu,et al.  Enhancing Peptide Identification Confidence by Combining Search Methods , 2008, Journal of proteome research.

[31]  Natalie I. Tasman,et al.  iProphet: Multi-level Integrative Analysis of Shotgun Proteomic Data Improves Peptide and Protein Identification Rates and Error Estimates* , 2011, Molecular & Cellular Proteomics.

[32]  Michael Gribskov,et al.  Estimating and Evaluating the Statistics of Gapped Local-Alignment Scores , 2003, J. Comput. Biol..

[33]  B. Searle,et al.  Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies. , 2008, Journal of proteome research.

[34]  Xue Wu,et al.  An Unsupervised, Model-Free, Machine-Learning Combiner for Peptide Identifications from Tandem Mass Spectra , 2009, Clinical Proteomics.

[35]  William Stafford Noble,et al.  An Alignment-Free "Metapeptide" Strategy for Metaproteomic Characterization of Microbiome Samples Using Shotgun Metagenomic Sequencing. , 2016, Journal of proteome research.