A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics

Motivation: Target-decoy search (TDS) is currently the most popular strategy for estimating and controlling the false discovery rate (FDR) of peptide identifications in mass spectrometry-based shotgun proteomics. While this strategy is very useful in practice and has been intensively studied empirically, its theoretical foundation has not yet been well established. Result: In this work, we systematically analyze the TDS strategy in a rigorous statistical sense. We prove that the commonly used concatenated TDS provides a conservative estimate of the FDR for any given score threshold, but it cannot rigorously control the FDR. We prove that with a slight modification to the commonly used formula for FDR estimation, the peptide-level FDR can be rigorously controlled based on the concatenated TDS. We show that the spectrum-level FDR control is difficult. We verify the theoretical conclusions with real mass spectrometry data.

[1]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[2]  John D. Storey A direct approach to false discovery rates , 2002 .

[3]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[4]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[5]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[6]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[7]  Wen Gao,et al.  Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry , 2004, Bioinform..

[8]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[9]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[10]  Hyungwon Choi,et al.  False discovery rates and related statistical concepts in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[11]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[12]  S. Mohammed,et al.  Improved peptide identification by targeted fragmentation using CID, HCD and ETD on an LTQ-Orbitrap Velos. , 2011, Journal of proteome research.

[13]  William Stafford Noble,et al.  On using samples of known protein content to assess the statistical calibration of scores assigned to peptide-spectrum matches in shotgun proteomics. , 2011, Journal of proteome research.

[14]  Nuno Bandeira,et al.  False discovery rates in spectral identification , 2012, BMC Bioinformatics.

[15]  William Stafford Noble,et al.  Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics. , 2013, Journal of proteomics.

[16]  M. Dong,et al.  pNovo+: de novo peptide sequencing using complementary HCD and ETD tandem mass spectra. , 2013, Journal of proteome research.

[17]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.