QPROT: Statistical method for testing differential expression using protein-level intensity data in label-free quantitative proteomics.

UNLABELLED We introduce QPROT, a statistical framework and computational tool for differential protein expression analysis using protein intensity data. QPROT is an extension of the QSPEC suite, originally developed for spectral count data, adapted for the analysis using continuously measured protein-level intensity data. QPROT offers a new intensity normalization procedure and model-based differential expression analysis, both of which account for missing data. Determination of differential expression of each protein is based on the standardized Z-statistic based on the posterior distribution of the log fold change parameter, guided by the false discovery rate estimated by a well-known Empirical Bayes method. We evaluated the classification performance of QPROT using the quantification calibration data from the clinical proteomic technology assessment for cancer (CPTAC) study and a recently published Escherichia coli benchmark dataset, with evaluation of FDR accuracy in the latter. BIOLOGICAL SIGNIFICANCE QPROT is a statistical framework with computational software tool for comparative quantitative proteomics analysis. It features various extensions of QSPEC method originally built for spectral count data analysis, including probabilistic treatment of missing values in protein intensity data. With the increasing popularity of label-free quantitative proteomics data, the proposed method and accompanying software suite will be immediately useful for many proteomics laboratories. This article is part of a Special Issue entitled: Computational Proteomics.

[1]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[2]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[3]  Alexander Schmidt,et al.  Critical assessment of proteome‐wide label‐free absolute abundance estimation strategies , 2013, Proteomics.

[4]  Benjamin Thomas,et al.  Comparative evaluation of label‐free SINQ normalized spectral index quantitation in the central proteomics facilities pipeline , 2011, Proteomics.

[5]  Brendan MacLean,et al.  MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments , 2014, Bioinform..

[6]  Jianhua Huang,et al.  A statistical framework for protein quantitation in bottom-up MS-based proteomics , 2009, Bioinform..

[7]  Chih-Chiang Tsou,et al.  DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics , 2015, Nature Methods.

[8]  Norman Pavelka,et al.  Statistical Similarities between Transcriptomics and Quantitative Shotgun Proteomics Data *S , 2008, Molecular & Cellular Proteomics.

[9]  M. Mann,et al.  Exponentially Modified Protein Abundance Index (emPAI) for Estimation of Absolute Protein Amount in Proteomics by the Number of Sequenced Peptides per Protein*S , 2005, Molecular & Cellular Proteomics.

[10]  K. Resing,et al.  Comparison of Label-free Methods for Quantifying Human Proteins by Shotgun Proteomics*S , 2005, Molecular & Cellular Proteomics.

[11]  M. Mann,et al.  Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, as a Simple and Accurate Approach to Expression Proteomics* , 2002, Molecular & Cellular Proteomics.

[12]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[13]  Tao Xu,et al.  Bioinformatics Applications Note Sequence Analysis Xdia: Improving on the Label-free Data-independent Analysis , 2022 .

[14]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[15]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[16]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[17]  Y. Levin,et al.  MS1-based label-free proteomics using a quadrupole orbitrap mass spectrometer. , 2015, Journal of proteome research.

[18]  Haiyuan Yu,et al.  A Bayesian Mixture Model for Comparative Spectral Count Data in Shotgun Proteomics , 2011, Molecular & Cellular Proteomics.

[19]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[20]  M. Gorenstein,et al.  Absolute Quantification of Proteins by LCMSE , 2006, Molecular & Cellular Proteomics.

[21]  E. Marcotte,et al.  Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation , 2007, Nature Biotechnology.

[22]  N. L. Heinecke,et al.  PepC: proteomics software for identifying differentially expressed proteins based on spectral counting , 2010, Bioinform..

[23]  Martin Kircher,et al.  Deep proteome and transcriptome mapping of a human cancer cell line , 2011, Molecular systems biology.

[24]  K. Valgepea,et al.  Comparison and applications of label-free absolute proteome quantification methods on Escherichia coli. , 2012, Journal of proteomics.

[25]  Connie R. Jimenez,et al.  On the beta-binomial model for analysis of spectral count data in label-free tandem mass spectrometry-based proteomics , 2010, Bioinform..

[26]  John D. Venable,et al.  Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra , 2004, Nature Methods.

[27]  K. Parker,et al.  Multiplexed Protein Quantitation in Saccharomyces cerevisiae Using Amine-reactive Isobaric Tagging Reagents*S , 2004, Molecular & Cellular Proteomics.

[28]  Alexey I Nesvizhskii,et al.  Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[29]  J. Koziol,et al.  Label-free, normalized quantification of complex mass spectrometry data for proteomics analysis , 2009, Nature Biotechnology.

[30]  Birgit Schilling,et al.  Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. , 2010, Journal of proteome research.

[31]  Gunther Schadow,et al.  Protein quantification in label-free LC-MS experiments. , 2009, Journal of proteome research.

[32]  M. Selbach,et al.  Global quantification of mammalian gene expression control , 2011, Nature.

[33]  A. Nesvizhskii,et al.  Comparative analysis of different label-free mass spectrometry based protein abundance estimates and their correlation with RNA-Seq gene expression data. , 2012, Journal of proteome research.

[34]  Hyungwon Choi,et al.  Significance Analysis of Spectral Count Data in Label-free Shotgun Proteomics*S , 2008, Molecular & Cellular Proteomics.

[35]  Alexey I Nesvizhskii,et al.  Abacus: A computational tool for extracting and pre‐processing spectral count data for label‐free quantitative proteomic analysis , 2011, Proteomics.

[36]  Michael K. Coleman,et al.  Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae. , 2006, Journal of proteome research.