Bayesian Confidence Intervals for Multiplexed Proteomics Integrate Ion-statistics with Peptide Quantification Concordance*

Multiplexed proteomics has emerged as a powerful tool to measure protein expression levels across multiple conditions. The relative protein abundances are inferred by comparing the signal generated by isobaric tags, which encode the samples’ origins. Intuitively, the trust associated with a protein measurement depends on the similarity of ratios from different peptides and the signal level of these measurements. Up to this point in the field, peptide-level information has not typically been integrated into confidence, and only the most likely results for relative protein abundances are reported. If confidence is reported, it is based on proteinlevel measurement agreement between replicates. Here we present a mathematically rigorous approach that integrates peptide intensities and peptide-measurement agreement into confidence intervals for protein ratios (BACIQ). The main advantages of BACIQ are: 1) it removes the need to threshold reported peptide signal based on an arbitrary cut-off, thereby reporting more measurements from a given experiment; 2) confidence can be assigned without replicates; 3) for repeated experiments BACIQ provides confidence intervals for the union, not the intersection, of quantified proteins; 4) for repeated experiments, BACIQ confidence intervals are more predictive than confidence intervals based on protein measurement agreement. Therefore, our method drastically increases the value obtained from quantitative proteomics experiments and will help researchers to interpret their data and prioritize resources. To make our approach easily accessible we distribute it via an R/Stan package.

[1]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[2]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[3]  Vasek Chvátal,et al.  A Greedy Heuristic for the Set-Covering Problem , 1979, Math. Oper. Res..

[4]  D. Wessel,et al.  A method for the quantitative recovery of protein in dilute solution in the presence of detergents and lipids. , 1984, Analytical biochemistry.

[5]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[6]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[7]  N. Kudo,et al.  Leptomycin B inactivates CRM1/exportin 1 by covalent modification at a cysteine residue in the central conserved region. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[8]  M. Mann,et al.  Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, as a Simple and Accurate Approach to Expression Proteomics* , 2002, Molecular & Cellular Proteomics.

[9]  Andrew H. Thompson,et al.  Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. , 2003, Analytical chemistry.

[10]  Jean-Paul Fox,et al.  Modeling of Responses and Response Times with the Package cirt , 2007 .

[11]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[12]  M. Mann,et al.  Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips , 2007, Nature Protocols.

[13]  D. Murdoch,et al.  P-Values are Random Variables , 2008 .

[14]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[15]  A. Makarov,et al.  Dynamics of ions of intact proteins in the Orbitrap mass analyzer , 2009, Journal of the American Society for Mass Spectrometry.

[16]  Connie R. Jimenez,et al.  On the beta-binomial model for analysis of spectral count data in label-free tandem mass spectrometry-based proteomics , 2010, Bioinform..

[17]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[18]  Edward L. Huttlin,et al.  A Tissue-Specific Atlas of Mouse Protein Phosphorylation and Expression , 2010, Cell.

[19]  S. Gygi,et al.  MS3 eliminates ratio distortion in isobaric labeling-based multiplexed quantitative proteomics , 2011, Nature Methods.

[20]  Ruedi Aebersold,et al.  Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs , 2012, BMC Bioinformatics.

[21]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[22]  Salvatore Cappadona,et al.  Applications of stable isotope dimethyl labeling in quantitative proteomics , 2012, Analytical and Bioanalytical Chemistry.

[23]  T. Minka Estimating a Dirichlet distribution , 2012 .

[24]  N. Grishin,et al.  NESdb: a database of NES-containing CRM1 cargoes , 2012, Molecular biology of the cell.

[25]  Pavel Skums,et al.  Efficient error correction for next-generation sequencing of viral amplicons , 2012, BMC Bioinformatics.

[26]  M. Mann,et al.  Triple SILAC to Determine Stimulus Specific Interactions in the Wnt Pathway , 2011, Journal of proteome research.

[27]  L. Peshkin,et al.  Accurate multiplexed proteomics at the MS2 level using the complement reporter ion cluster. , 2012, Analytical chemistry.

[28]  Ann L. Oberg,et al.  Statistical methods for quantitative mass spectrometry proteomic experiments with labeling , 2012, BMC Bioinformatics.

[29]  The UniProt Consortium,et al.  Update on activities at the Universal Protein Resource (UniProt) in 2013 , 2012, Nucleic Acids Res..

[30]  B. Kuster,et al.  Measuring and managing ratio compression for accurate iTRAQ/TMT quantification. , 2013, Journal of proteome research.

[31]  L. Peshkin,et al.  Deep Proteomics of the Xenopus laevis Egg using an mRNA-Derived Reference Database , 2014, Current Biology.

[32]  Brendan MacLean,et al.  MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments , 2014, Bioinform..

[33]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[34]  Edward L. Huttlin,et al.  MultiNotch MS3 Enables Accurate, Sensitive, and Multiplexed Detection of Differential Expression across Cancer Cell Line Proteomes , 2014, Analytical chemistry.

[35]  Marco Y. Hein,et al.  Accurate Proteome-wide Label-free Quantification by Delayed Normalization and Maximal Peptide Ratio Extraction, Termed MaxLFQ * , 2014, Molecular & Cellular Proteomics.

[36]  L. Peshkin,et al.  The Nuclear Proteome of a Vertebrate , 2015, Current Biology.

[37]  Allon M. Klein,et al.  On the Relationship of Protein and mRNA Dynamics in Vertebrate Embryonic Development. , 2015, Developmental cell.

[38]  Mathias Wilhelm,et al.  A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets , 2015, Molecular & Cellular Proteomics.

[39]  M. Mann,et al.  The Impact II, a Very High-Resolution Quadrupole Time-of-Flight Instrument (QTOF) for Deep Shotgun Proteomics* , 2015, Molecular & Cellular Proteomics.

[40]  D. Görlich,et al.  A deep proteomics perspective on CRM1-mediated nuclear export and nucleocytoplasmic partitioning , 2015, eLife.

[41]  Hyungwon Choi,et al.  EBprot: Statistical analysis of labeling‐based quantitative proteomics data , 2015, Proteomics.

[42]  M. Mann,et al.  Deep Proteomics of Mouse Skeletal Muscle Enables Quantitation of Protein Isoforms, Metabolic Pathways, and Transcription Factors* , 2015, Molecular & Cellular Proteomics.

[43]  José A. Dianes,et al.  2016 update of the PRIDE database and its related tools , 2016, Nucleic Acids Res..

[44]  D. Figeys,et al.  Peptide-Centric Approaches Provide an Alternative Perspective To Re-Examine Quantitative Proteomic Data. , 2016, Analytical chemistry.

[45]  Patricia Greninger,et al.  Detection of Dysregulated Protein Association Networks by High-Throughput Proteomics Predicts Cancer Vulnerabilities , 2017, Nature Biotechnology.

[46]  L. Käll,et al.  Covariation of Peptide Abundances Accurately Reflects Protein Concentration Differences , 2017, Molecular & Cellular Proteomics.

[47]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[48]  Data , 2018, Nature.

[49]  M. Wühr,et al.  Accurate, Sensitive, and Precise Multiplexed Proteomics using the Complement Reporter Ion Cluster , 2017, bioRxiv.

[50]  Lillia V. Ryazanova,et al.  Quantitative Proteomics of Xenopus Embryos I, Sample Preparation. , 2018, Methods in molecular biology.

[51]  Susan E. Abbatiello,et al.  Nonlinear Regression Improves Accuracy of Characterization of Multiplexed Mass Spectrometric Assays * , 2018, Molecular & Cellular Proteomics.

[52]  Michael P Weekes,et al.  Compositional Proteomics: Effects of Spatial Constraints on Protein Quantification Utilizing Isobaric Tags , 2017, Journal of proteome research.