Robust summarization and inference in proteome-wide label-free quantification

Label-Free Quantitative mass spectrometry based workflows for differential expression (DE) analysis of proteins impose important challenges on the data analysis due to peptide-specific effects and context dependent missingness of peptide intensities. Peptide-based workflows, like MSqRob, test for DE directly from peptide intensities and outper-form summarization methods which first aggregate MS1 peptide intensities to protein intensities before DE analysis. However, these methods are computationally expensive, often hard to understand for the non-specialised end-user, and do not provide protein summaries, which are important for visualisation or downstream processing. In this work, we therefore evaluate state-of-the-art summarization strategies using a benchmark spike-in dataset and discuss why and when these fail compared to the state-of-the-art peptide based model, MSqRob. Based on this evaluation, we propose a novel summarization strategy, MSqRob-Sum, which estimates MSqRob’s model parameters in a two-stage procedure circumventing the drawbacks of peptide-based workflows. MSqRobSum maintains MSqRob’s superior performance, while providing useful protein expression summaries for plotting and downstream analysis. Summarising peptide to protein intensities considerably reduces the computational complexity, the memory footprint and the model complexity, and makes it easier to disseminate DE inferred on protein summaries. Moreover, MSqRobSum provides a highly modular analysis framework, which provides researchers with full flexibility to develop data analysis workflows tailored towards their specific applications.

[1]  Ana Ivelisse Avilés,et al.  Linear Mixed Models for Longitudinal Data , 2001, Technometrics.

[2]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[3]  Kris Gevaert,et al.  Peptide-level Robust Ridge Regression Improves Estimation, Sensitivity, and Specificity in Data-dependent Quantitative Label-free Shotgun Proteomics* , 2015, Molecular & Cellular Proteomics.

[4]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[5]  Marco Y. Hein,et al.  Accurate Proteome-wide Label-free Quantification by Delayed Normalization and Maximal Peptide Ratio Extraction, Termed MaxLFQ * , 2014, Molecular & Cellular Proteomics.

[6]  Kathryn S. Lilley,et al.  MSnbase-an R/Bioconductor package for isobaric tagged mass spectrometry data visualization, processing and quantitation , 2012, Bioinform..

[7]  Laurent Gatto,et al.  Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. , 2016, Journal of proteome research.

[8]  Lennart Martens,et al.  Summarization vs Peptide-Based Models in Label-Free Quantitative Proteomics: Performance, Pitfalls, and Data Analysis Guidelines. , 2015, Journal of proteome research.

[9]  H. Daub,et al.  Systematic evaluation of label-free and super-SILAC quantification for proteome expression analysis. , 2015, Rapid communications in mass spectrometry : RCM.

[10]  Kris Gevaert,et al.  Experimental design and data-analysis in label-free quantitative LC/MS proteomics: A tutorial with MSqRob. , 2018, Journal of proteomics.

[11]  Martin Vingron,et al.  Variance Stabilization and Robust Normalization for Microarray Gene Expression Data , 2002, COMPSTAT.

[12]  Frederick Mosteller,et al.  Data Analysis and Regression , 1978 .

[13]  Quanhu Sheng,et al.  Systematic Assessment of Survey Scan and MS2-Based Abundance Strategies for Label-Free Quantitative Proteomics Using High-Resolution MS Data , 2014, Journal of proteome research.

[14]  Christian Cole,et al.  Proteus: an R package for downstream analysis of MaxQuant output , 2018, bioRxiv.

[15]  M. Gorenstein,et al.  Absolute Quantification of Proteins by LCMSE , 2006, Molecular & Cellular Proteomics.

[16]  Qiang Hu,et al.  IonStar enables high-precision, low-missing-data proteomics quantification in large biological cohorts , 2018, Proceedings of the National Academy of Sciences.

[17]  W. Huber,et al.  Proteome-wide identification of ubiquitin interactions using UbIA-MS , 2018, Nature Protocols.

[18]  Brendan MacLean,et al.  MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments , 2014, Bioinform..

[19]  Pei Wang,et al.  Analyzing LC-MS/MS data by spectral count and ion abundance: two case studies. , 2011, Statistics and its interface.