False Discovery Rate Estimation in Proteomics.

With the advancement in proteomics separation techniques and improvements in mass analyzers, the data generated in a mass-spectrometry based proteomics experiment is rising exponentially. Such voluminous datasets necessitate automated computational tools for high-throughput data analysis and appropriate statistical control. The data is searched using one or more of the several popular database search algorithms. The matches assigned by these tools can have false positives and statistical validation of these false matches is necessary before making any biological interpretations. Without such procedures, the biological inferences do not hold true and may be outright misleading. There is a considerable overlap between true and false positives. To control the false positives amongst a set of accepted matches, there is a need for some statistical estimate that can reflect the amount of false positives present in the data processed. False discovery rate (FDR) is the metric for global confidence assessment of a large-scale proteomics dataset. This chapter covers the basics of FDR, its application in proteomics, and methods to estimate FDR.

[1]  Benno Schwikowski,et al.  MUDE: a new approach for optimizing sensitivity in the target-decoy search strategy for large-scale peptide/protein identification. , 2010, Journal of proteome research.

[2]  Pedro Navarro,et al.  A refined method to calculate false discovery rates for peptide identification using decoy databases. , 2009, Journal of proteome research.

[3]  David L Tabb,et al.  What's driving false discovery rates? , 2008, Journal of proteome research.

[4]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[5]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[6]  Amit Kumar Yadav,et al.  ProteoStats - a library for estimating false discovery rates in proteomics pipelines , 2013, Bioinform..

[7]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[8]  Hyungwon Choi,et al.  False discovery rates and related statistical concepts in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[9]  William Stafford Noble,et al.  Improvements to the percolator algorithm for Peptide identification from shotgun proteomics data sets. , 2009, Journal of proteome research.

[10]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[11]  Qunhua Li,et al.  Modes of inference for evaluating the confidence of peptide identifications. , 2008, Journal of proteome research.

[12]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[13]  Wei Sun,et al.  Oscore: a combined score to reduce false negative rates for peptide identification in tandem mass spectrometry analysis. , 2009, Journal of mass spectrometry : JMS.

[14]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Amit Kumar Yadav,et al.  MassWiz: a novel scoring algorithm with target-decoy based analysis pipeline for tandem mass spectrometry. , 2011, Journal of proteome research.

[16]  Michael D. Litton,et al.  IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. , 2009, Journal of proteome research.

[17]  William Stafford Noble,et al.  Posterior error probabilities and false discovery rates: two sides of the same coin. , 2008, Journal of proteome research.

[18]  D. Dash,et al.  Learning from Decoys to Improve the Sensitivity and Specificity of Proteomics Database Search Results , 2012, PloS one.

[19]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[20]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[21]  Joshua E. Elias,et al.  Target-Decoy Search Strategy for Mass Spectrometry-Based Proteomics , 2010, Proteome Bioinformatics.

[22]  Markus Brosch,et al.  Accurate and sensitive peptide identification with Mascot Percolator. , 2009, Journal of proteome research.

[23]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[24]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[25]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[26]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[27]  J. Buhmann,et al.  Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry* , 2009, Molecular & Cellular Proteomics.

[28]  Markus Brosch,et al.  Enhanced Peptide Identification by Electron Transfer Dissociation Using an Improved Mascot Percolator* , 2012, Molecular & Cellular Proteomics.

[29]  D. Ghosh,et al.  Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling. , 2008, Journal of proteome research.