Unbiased False Discovery Rate Estimation for Shotgun Proteomics Based on the Target-Decoy Approach.

Target-decoy approach (TDA) is the dominant strategy for false discovery rate (FDR) estimation in mass-spectrometry-based proteomics. One of its main applications is direct FDR estimation based on counting of decoy matches above a certain score threshold. The corresponding equations are widely employed for filtering of peptide or protein identifications. In this work we consider a probability model describing the filtering process and find that, when decoy counting is used for q value estimation and subsequent filtering, a correction has to be introduced into these common equations for TDA-based FDR estimation. We also discuss the scale of variance of false discovery proportion (FDP) and propose using confidence intervals for more conservative FDP estimation in shotgun proteomics. The necessity of both the correction and the use of confidence intervals is especially pronounced when filtering small sets (such as in proteogenomics experiments) and when using very low FDR thresholds.

[1]  L. Käll,et al.  Quality assessments of peptide–spectrum matches in shotgun proteomics , 2011, Proteomics.

[2]  Stephan M. Winkler,et al.  MS Amanda, a Universal Identification Algorithm Optimized for High Accuracy Tandem Mass Spectra , 2014, Journal of proteome research.

[3]  P. Pevzner,et al.  Target-Decoy Approach and False Discovery Rate: When Things May Go Wrong , 2011, Journal of the American Society for Mass Spectrometry.

[4]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[5]  Joshua E. Elias,et al.  Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. , 2003, Journal of proteome research.

[6]  Edward L Huttlin,et al.  Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy. , 2007, Journal of proteome research.

[7]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[8]  William Stafford Noble,et al.  Posterior error probabilities and false discovery rates: two sides of the same coin. , 2008, Journal of proteome research.

[9]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[10]  Yudi Pawitan,et al.  Estimation of false discovery proportion under general dependence , 2006, Bioinform..

[11]  William Stafford Noble,et al.  Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry , 2008, ECCB.

[12]  Chao Liu,et al.  A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics , 2015, 1501.00537.

[13]  Alexey I Nesvizhskii,et al.  Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[14]  Nuno Bandeira,et al.  False discovery rates in spectral identification , 2012, BMC Bioinformatics.

[15]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[16]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[18]  Jianqing Fan,et al.  Journal of the American Statistical Association Estimating False Discovery Proportion under Arbitrary Covariance Dependence Estimating False Discovery Proportion under Arbitrary Covariance Dependence , 2022 .

[19]  William Stafford Noble,et al.  Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics. , 2013, Journal of proteomics.

[20]  M. Mann,et al.  Andromeda: a peptide search engine integrated into the MaxQuant environment. , 2011, Journal of proteome research.

[21]  Roger E. Moore,et al.  Qscore: An algorithm for evaluating SEQUEST database search results , 2002, Journal of the American Society for Mass Spectrometry.

[22]  Lev I Levitsky,et al.  Pyteomics—a Python Framework for Exploratory Data Analysis and Rapid Software Prototyping in Proteomics , 2013, Journal of The American Society for Mass Spectrometry.

[23]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[24]  B. Sorić Statistical “Discoveries” and Effect-Size Estimation , 1989 .

[25]  J. Coon,et al.  A proteomics search algorithm specifically designed for high-resolution tandem mass spectra. , 2013, Journal of proteome research.