MS-EmpiRe utilizes peptide-level noise distributions for ultra sensitive detection of differentially abundant proteins

Mass spectrometry based proteomics is the method of choice for quantifying genome-wide differential changes of proteins in a wide range of biological and biomedical applications. Protein changes need to be reliably derived from a large number of measured peptide intensities and their corresponding fold changes. These fold changes vary considerably for a given protein. Numerous instrumental setups aim to reduce this variability, while current computational methods only implicitly account for this problem. We introduce a new method, MS-EmpiRe (github.com/zimmerlab/MS-EmpiRe), which explicitly accounts for the noise underlying peptide fold changes. We derive dataset-specific, intensity-dependent empirical error distributions, which are used for individual weighing of peptide fold changes to detect differentially abundant proteins. The method requires only peptide intensities mapped to proteins and, thus, can be applied to any common quantitative proteomics setup. In a recently published proteome-wide benchmarking dataset, MS-EmpiRe doubles the number of correctly identified changing proteins at a correctly estimated FDR cutoff in comparison to state-of-the-art tools. We confirm the superior performance of MS-EmpiRe on simulated data. MS-EmpiRe provides rapid processing (< 2min) and is an easy to use, general-purpose tool.

[1]  Gunther Schadow,et al.  Protein quantification in label-free LC-MS experiments. , 2009, Journal of proteome research.

[2]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[3]  Oliver M. Bernhardt,et al.  Optimization of Experimental Parameters in Data-Independent Mass Spectrometry Significantly Increases Depth and Reproducibility of Results* , 2017, Molecular & Cellular Proteomics.

[4]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[5]  Thomas Lengauer,et al.  Centralization: a new method for the normalization of gene expression data , 2001, ISMB.

[6]  Jianhua Huang,et al.  A statistical framework for protein quantitation in bottom-up MS-based proteomics , 2009, Bioinform..

[7]  B. Nielsen,et al.  Peptide polarity and the position of arginine as sources of selectivity during positive electrospray ionisation mass spectrometry. , 2011, Rapid communications in mass spectrometry : RCM.

[8]  Matthias Mann,et al.  BoxCar acquisition method enables single-shot proteomics at a depth of 10,000 proteins in 100 minutes , 2018, Nature Methods.

[9]  Hyungwon Choi,et al.  EBprot: Statistical analysis of labeling‐based quantitative proteomics data , 2015, Proteomics.

[10]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[11]  Andrew H. Thompson,et al.  Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. , 2003, Analytical chemistry.

[12]  M. Mann,et al.  Parts per Million Mass Accuracy on an Orbitrap Mass Spectrometer via Lock Mass Injection into a C-trap*S , 2005, Molecular & Cellular Proteomics.

[13]  Lennart Martens,et al.  Summarization vs Peptide-Based Models in Label-Free Quantitative Proteomics: Performance, Pitfalls, and Data Analysis Guidelines. , 2015, Journal of proteome research.

[14]  S. Gygi,et al.  ms3 eliminates ratio distortion in isobaric multiplexed quantitative , 2011 .

[15]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[16]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[17]  Marco Y. Hein,et al.  Accurate Proteome-wide Label-free Quantification by Delayed Normalization and Maximal Peptide Ratio Extraction, Termed MaxLFQ * , 2014, Molecular & Cellular Proteomics.

[18]  R. Rosenthal Combining results of independent studies. , 1978 .

[19]  Marco Y. Hein,et al.  The Perseus computational platform for comprehensive analysis of (prote)omics data , 2016, Nature Methods.

[20]  Bernhard Kuster,et al.  Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present , 2012, Analytical and Bioanalytical Chemistry.

[21]  Steven P. Gygi,et al.  Proteome-Wide Evaluation of Two Common Protein Quantification Methods. , 2018, Journal of proteome research.

[22]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[23]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[24]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[25]  Kris Gevaert,et al.  Peptide-level Robust Ridge Regression Improves Estimation, Sensitivity, and Specificity in Data-dependent Quantitative Label-free Shotgun Proteomics* , 2015, Molecular & Cellular Proteomics.