Bayesian false discovery rates for post-translational modification proteomics

Tandem mass spectrometry-based proteomics enables high throughput analysis of post-translational modifications (PTMs) on proteins. In current researches of shotgun proteomics, peptides with various PTMs and those without PTMs are often identified together and an overall false discovery rate (FDR) is estimated. However, it is often the case that only a subset of identifications, e.g. those with specific PTMs, are emphasized or reported. In doing so, the risk arises that the FDR of reported results is seriously underor overestimated, based on which unreliable conclusions may be drawn. But unfortunately, this has not been widely realized in the field, and there is still no agreement on the right way to control the FDR of PTM identifications. As a result, the ostrich policy is commonly adopted wittingly or unwittingly, i.e., a simplistic overall estimate is assumed. This paper, for the first time, proves that the FDRs of various PTM identifications are in theory not equivalent to the overall FDR and quantifies several major factors influencing their relationships. Elaborate simulation experiments are carried out to empirically verify the theoretical conclusions. Strategies are suggested for better control of PTM FDRs.

[1]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[2]  Harrison H. Zhou,et al.  False Discovery Rate Control With Groups , 2010, Journal of the American Statistical Association.

[3]  William Stafford Noble,et al.  Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry , 2008, ECCB.

[4]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[5]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[6]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[7]  Bradley Efron,et al.  Large-scale inference , 2010 .

[8]  Osamu Ishikawa,et al.  Fucosylated haptoglobin is a novel marker for pancreatic cancer: A detailed analysis of the oligosaccharide structure and a possible mechanism for fucosylation , 2006, International journal of cancer.

[9]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[10]  R. Beavis,et al.  A method for reducing the time required to match protein sequences with tandem mass spectra. , 2003, Rapid communications in mass spectrometry : RCM.

[11]  Alexey I Nesvizhskii,et al.  Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[12]  L. Wasserman,et al.  False discovery control with p-value weighting , 2006 .

[13]  Wen Gao,et al.  Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry , 2004, Bioinform..

[14]  Stan Pounds,et al.  Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of P-values , 2003, Bioinform..

[15]  Gary Walsh,et al.  Post-translational modifications in the context of therapeutic proteins , 2006, Nature Biotechnology.

[16]  Baruch S Blumberg,et al.  Use of targeted glycoproteomics to identify serum glycoproteins that correlate with liver cancer in woodchucks and humans. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[18]  Hyungwon Choi,et al.  Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[19]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[20]  D. Ghosh,et al.  Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling. , 2008, Journal of proteome research.

[21]  M. Mann,et al.  Proteomics to study genes and genomes , 2000, Nature.

[22]  B. Efron SIMULTANEOUS INFERENCE : WHEN SHOULD HYPOTHESIS TESTING PROBLEMS BE COMBINED? , 2008, 0803.3863.

[23]  William Stafford Noble,et al.  How does multiple testing correction work? , 2009, Nature Biotechnology.

[24]  Hyungwon Choi,et al.  False discovery rates and related statistical concepts in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[25]  Christopher T. Walsh,et al.  Posttranslational Modification of Proteins: Expanding Nature's Inventory , 2005 .

[26]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .

[27]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[28]  Fei GENG,et al.  The expression of core fucosylated E-cadherin in cancer cells and lung cancer patients: prognostic implications , 2004, Cell Research.

[29]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[30]  M. Mann,et al.  4. Proteomic Analysis of Posttranslational Modifications , 2013 .

[31]  Hao Chi,et al.  A Strategy for Precise and Large Scale Identification of Core Fucosylated Glycoproteins*S , 2009, Molecular & Cellular Proteomics.

[32]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[33]  Bradley Efron,et al.  Microarrays, Empirical Bayes and the Two-Groups Model. Rejoinder. , 2008, 0808.0572.

[34]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[35]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[36]  John D. Storey A direct approach to false discovery rates , 2002 .

[37]  K. Resing,et al.  Mapping protein post-translational modifications with mass spectrometry , 2007, Nature Methods.

[38]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[39]  Jie Ma,et al.  Bayesian Nonparametric Model for the Validation of Peptide Identification in Shotgun Proteomics*S , 2009, Molecular & Cellular Proteomics.