Improving Peptide-Level Mass Spectrometry Analysis via Double Competition.

The analysis of shotgun proteomics data often involves generating lists of inferred peptide-spectrum matches (PSMs) and/or of peptides. The canonical approach for generating these discovery lists is by controlling the false discovery rate (FDR), most commonly through target-decoy competition (TDC). At the PSM level, TDC is implemented by competing each spectrum's best-scoring target (real) peptide match with its best match against a decoy database. This PSM-level procedure can be adapted to the peptide level by selecting the top-scoring PSM per peptide prior to FDR estimation. Here, we first highlight and empirically augment a little known previous work by He et al., which showed that TDC-based PSM-level FDR estimates can be liberally biased. We thus propose that researchers instead focus on peptide-level analysis. We then investigate three ways to carry out peptide-level TDC and show that the most common method ("PSM-only") offers the lowest statistical power in practice. An alternative approach that carries out a double competition, first at the PSM and then at the peptide level ("PSM-and-peptide"), is the most powerful method, yielding an average increase of 17% more discovered peptides at 1% FDR threshold relative to the PSM-only method.

[1]  William Stafford Noble,et al.  Group-walk, a rigorous approach to group-wise false discovery rate analysis by target-decoy competition , 2022, bioRxiv.

[2]  Henry H. N. Lam,et al.  Common Decoy Distributions Simplify False Discovery Rate Estimation in Shotgun Proteomics. , 2022, Journal of proteome research.

[3]  Y. Hérault,et al.  ProMetIS, deep phenotyping of mouse models by combined proteomics and metabolomics analysis , 2021, Scientific Data.

[4]  Deanna L. Plubell,et al.  Accurately Assigning Peptides to Spectra When Only a Subset of Peptides Are Relevant. , 2021, Journal of proteome research.

[5]  Y. Couté,et al.  Beyond target-decoy competition: stable validation of peptide and protein identifications in mass spectrometry-based discovery proteomics. , 2020, Analytical chemistry.

[6]  Shantanu Jain,et al.  New mixture models for decoy-free false discovery rate estimation in mass spectrometry proteomics. , 2020, Bioinformatics.

[7]  Pavel Sulimov,et al.  Tailor: non-parametric and rapid score calibration method for database search-based peptide identification in shotgun proteomics. , 2020, Journal of proteome research.

[8]  P. Daran-Lapujade,et al.  A proteome-integrated, carbon source dependent genetic regulatory network in Saccharomyces cerevisiae. , 2019, Molecular omics.

[9]  P. Lasch,et al.  Sample Preparation by Easy Extraction and Digestion (SPEED) - A Universal, Rapid, and Detergent-free Protocol for Proteomics Based on Acid Extraction* , 2019, Molecular & Cellular Proteomics.

[10]  Pavel Sulimov,et al.  Bias in False Discovery Rate Estimation in Mass-Spectrometry-Based Peptide Identification. , 2019, Journal of proteome research.

[11]  Martin Eisenacher,et al.  The PRIDE database and related tools and resources in 2019: improving support for quantification data , 2018, Nucleic Acids Res..

[12]  William Stafford Noble,et al.  Combining High-Resolution and Exact Calibration To Boost Statistical Power: A Well-Calibrated Score Function for High-Resolution MS2 Data. , 2018, Journal of proteome research.

[13]  Sarah C. Jenson,et al.  Ricin‐like proteins from the castor plant do not influence liquid chromatography‐mass spectrometry detection of ricin in forensically relevant samples , 2017, Toxicon : official journal of the International Society on Toxinology.

[14]  Kyu-Baek Hwang,et al.  Systematic Comparison of False-Discovery-Rate-Controlling Strategies for Proteogenomic Search Using Spike-in Experiments. , 2017, Journal of proteome research.

[15]  William Stafford Noble,et al.  Param-Medic: A Tool for Improving MS/MS Database Search Yield by Optimizing Parameter Settings. , 2017, Journal of proteome research.

[16]  Lev I Levitsky,et al.  Unbiased False Discovery Rate Estimation for Shotgun Proteomics Based on the Target-Decoy Approach. , 2017, Journal of proteome research.

[17]  Matthew The,et al.  How to talk about protein‐level false discovery rates in shotgun proteomics , 2016, Proteomics.

[18]  Mathias Wilhelm,et al.  A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets , 2015, Molecular & Cellular Proteomics.

[19]  Chao Liu,et al.  A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics , 2015, 1501.00537.

[20]  William Stafford Noble,et al.  On the Importance of Well-Calibrated Scores for Identifying Shotgun Proteomics Spectra , 2014, Journal of proteome research.

[21]  William Stafford Noble,et al.  Crux: Rapid Open Source Protein Tandem Mass Spectrometry Analysis , 2014, Journal of proteome research.

[22]  William Stafford Noble,et al.  Computing Exact p-values for a Cross-correlation Shotgun Proteomics Score Function , 2014, Molecular & Cellular Proteomics.

[23]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[24]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[25]  William Stafford Noble,et al.  Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics. , 2013, Journal of proteomics.

[26]  Nuno Bandeira,et al.  False discovery rates in spectral identification , 2012, BMC Bioinformatics.

[27]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[28]  William Stafford Noble,et al.  Faster SEQUEST searching for peptide identification from tandem mass spectra. , 2011, Journal of proteome research.

[29]  M. MacCoss,et al.  A fast SEQUEST cross correlation algorithm. , 2008, Journal of proteome research.

[30]  William Stafford Noble,et al.  Rapid and accurate peptide identification from tandem mass spectra. , 2008, Journal of proteome research.

[31]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[32]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[33]  Ruedi Aebersold,et al.  The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools. , 2008, Journal of proteome research.

[34]  Richard D. Smith,et al.  Clustering millions of tandem mass spectra. , 2008, Journal of proteome research.

[35]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[36]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .