gpGrouper: A Peptide Grouping Algorithm for Gene-Centric Inference and Quantitation of Bottom-Up Proteomics Data*

gpGrouper is a gene-centric peptide inference and quantitation algorithm that prevents gene origin mixing and isoform omission in parsimonious protein-centric approaches. A simple classification schema indicates distinguishable gene products, with shared peptide quantities distributed by ratios of corresponding unique peptides. This approach accurately determines tumor content and deconvolution of proteomes from mixed species patient derived xenografts without elimination of species-shared peptides. iBAQ quantities are calculated from label-free, isotopic, or isobaric data, allowing comparisons within and across samples and methodologies. Graphical Abstract Highlights Gene-centric inference algorithm with classification for distinguishable groups. Shared peptides are split proportionally to corresponding unique peptide ratios. iBAQ values are calculated for label-free, isotopic or isobaric labeling methods. Universally handles single or mixed species PDX data with accurate deconvolution. In quantitative mass spectrometry, the method by which peptides are grouped into proteins can have dramatic effects on downstream analyses. Here we describe gpGrouper, an inference and quantitation algorithm that offers an alternative method for assignment of protein groups by gene locus and improves pseudo-absolute iBAQ quantitation by weighted distribution of shared peptide areas. We experimentally show that distributing shared peptide quantities based on unique peptide peak ratios improves quantitation accuracy compared with conventional winner-take-all scenarios. Furthermore, gpGrouper seamlessly handles two-species samples such as patient-derived xenografts (PDXs) without ignoring the host species or species-shared peptides. This is a critical capability for proper evaluation of proteomics data from PDX samples, where stromal infiltration varies across individual tumors. Finally, gpGrouper calculates peptide peak area (MS1) based expression estimates from multiplexed isobaric data, producing iBAQ results that are directly comparable across label-free, isotopic, and isobaric proteomics approaches.

[1]  S. Shirran,et al.  A comparison of the accuracy of iTRAQ quantification by nLC-ESI MSMS and nLC-MALDI MSMS methods , 2010, Journal of proteomics.

[2]  M. Selbach,et al.  Global quantification of mammalian gene expression control , 2011, Nature.

[3]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[4]  Gorka Prieto,et al.  SIR: Deterministic protein inference from peptides assigned to MS data. , 2012, Journal of proteomics.

[5]  Aik Choon Tan,et al.  Patient-derived tumour xenografts as models for oncology drug development , 2012, Nature Reviews Clinical Oncology.

[6]  Michael P Weekes,et al.  Compositional Proteomics: Effects of Spatial Constraints on Protein Quantification Utilizing Isobaric Tags , 2017, Journal of proteome research.

[7]  Robert E. Kearney,et al.  Methods for combining peptide intensities to estimate relative protein abundance , 2010, Bioinform..

[8]  C. Ahrens,et al.  PeptideClassifier for protein inference and targeted quantitative proteomics , 2010, Nature Biotechnology.

[9]  Maxime W. C. Rousseaux,et al.  An Anatomically Resolved Mouse Brain Proteome Reveals Parkinson Disease-relevant Pathways * , 2017, Molecular & Cellular Proteomics.

[10]  M. Monden,et al.  Stromal Myofibroblasts Predict Disease Recurrence for Colorectal Cancer , 2007, Clinical Cancer Research.

[11]  Zengyou He,et al.  Protein inference: a review , 2012, Briefings Bioinform..

[12]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[13]  M. Washburn,et al.  Refinements to label free proteome quantitation: how to deal with peptides shared by multiple proteins. , 2010, Analytical chemistry.

[14]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[15]  K. Parker,et al.  Multiplexed Protein Quantitation in Saccharomyces cerevisiae Using Amine-reactive Isobaric Tagging Reagents*S , 2004, Molecular & Cellular Proteomics.

[16]  D. McMillan,et al.  The relationship between tumour stroma percentage, the tumour microenvironment and survival in patients with primary operable colorectal cancer. , 2014, Annals of oncology : official journal of the European Society for Medical Oncology.

[17]  Ishtiaq Rehman,et al.  iTRAQ underestimation in simple and complex mixtures: "the good, the bad and the ugly". , 2009, Journal of proteome research.

[18]  Li Ding,et al.  Proteogenomic integration reveals therapeutic targets in breast cancer xenografts , 2017, Nature Communications.

[19]  S. Gygi,et al.  ms3 eliminates ratio distortion in isobaric multiplexed quantitative , 2011 .

[20]  N. Karp,et al.  Addressing Accuracy and Precision Issues in iTRAQ Quantitation* , 2010, Molecular & Cellular Proteomics.

[21]  D. Tabb,et al.  Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. , 2007, Journal of proteome research.

[22]  Li Ding,et al.  Endocrine-therapy-resistant ESR1 variants revealed by genomic characterization of breast-cancer-derived xenografts. , 2013, Cell reports.

[23]  Philipp E. Geyer,et al.  Ultra-deep and quantitative saliva proteome reveals dynamics of the oral microbiome , 2016, Genome Medicine.

[24]  Alexander Schmidt,et al.  Critical assessment of proteome‐wide label‐free absolute abundance estimation strategies , 2013, Proteomics.

[25]  Jüergen Cox,et al.  The MaxQuant computational platform for mass spectrometry-based shotgun proteomics , 2016, Nature Protocols.

[26]  Daniel C. Liebler,et al.  Detection of Proteome Diversity Resulted from Alternative Splicing is Limited by Trypsin Cleavage Specificity* , 2017, Molecular & Cellular Proteomics.

[27]  Chad A Shaw,et al.  A renewable tissue resource of phenotypically stable, biologically and ethnically diverse, patient-derived human breast cancer xenograft models. , 2013, Cancer research.

[28]  Predrag Radivojac,et al.  Computational approaches to protein inference in shotgun proteomics , 2012, BMC Bioinformatics.

[29]  Lewis Y. Geer,et al.  DBParser: web-based software for shotgun proteomic data analyses. , 2004, Journal of proteome research.

[30]  K. Valgepea,et al.  Comparison and applications of label-free absolute proteome quantification methods on Escherichia coli. , 2012, Journal of proteomics.

[31]  Li Ding,et al.  Breast tumors educate the proteome of stromal tissue in an individualized but coordinated manner , 2017, Science Signaling.

[32]  M. Mann,et al.  Stable isotope labeling by amino acids in cell culture for quantitative proteomics. , 2007, Methods in molecular biology.

[33]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[34]  Manuel Hidalgo,et al.  Patient-derived xenograft models: an emerging platform for translational cancer research. , 2014, Cancer discovery.

[35]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[36]  Andrew H. Thompson,et al.  Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. , 2003, Analytical chemistry.

[37]  Edward L. Huttlin,et al.  MultiNotch MS3 Enables Accurate, Sensitive, and Multiplexed Detection of Differential Expression across Cancer Cell Line Proteomes , 2014, Analytical chemistry.