Bayesian Proteoform Modeling Improves Protein Quantification of Global Proteomic Measurements*

As the capability of mass spectrometry-based proteomics has matured, tens of thousands of peptides can be measured simultaneously, which has the benefit of offering a systems view of protein expression. However, a major challenge is that, with an increase in throughput, protein quantification estimation from the native measured peptides has become a computational task. A limitation to existing computationally driven protein quantification methods is that most ignore protein variation, such as alternate splicing of the RNA transcript and post-translational modifications or other possible proteoforms, which will affect a significant fraction of the proteome. The consequence of this assumption is that statistical inference at the protein level, and consequently downstream analyses, such as network and pathway modeling, have only limited power for biomarker discovery. Here, we describe a Bayesian Proteoform Quantification model (BP-Quant)1 that uses statistically derived peptides signatures to identify peptides that are outside the dominant pattern or the existence of multiple overexpressed patterns to improve relative protein abundance estimates. It is a research-driven approach that utilizes the objectives of the experiment, defined in the context of a standard statistical hypothesis, to identify a set of peptides exhibiting similar statistical behavior relating to a protein. This approach infers that changes in relative protein abundance can be used as a surrogate for changes in function, without necessarily taking into account the effect of differential post-translational modifications, processing, or splicing in altering protein function. We verify the approach using a dilution study from mouse plasma samples and demonstrate that BP-Quant achieves similar accuracy as the current state-of-the-art methods at proteoform identification with significantly better specificity. BP-Quant is available as a MatLab® and R packages.

[1]  Haixu Tang,et al.  A novel alignment method and multiple filters for exclusion of unqualified peptides to enhance label-free quantification using peptide intensity in LC-MS/MS. , 2011, Journal of proteome research.

[2]  Douglas Galasko,et al.  Biomarkers for Alzheimer's disease in plasma, serum and blood - conceptual and practical problems , 2013, Alzheimer's Research & Therapy.

[3]  Limsoon Wong,et al.  How Advancement in Biological Network Analysis Methods Empowers Proteomics , 2022 .

[4]  Joel G Pounds,et al.  Diet-induced obesity reprograms the inflammatory response of the murine lung to inhaled endotoxin. , 2013, Toxicology and applied pharmacology.

[5]  Navdeep Jaitly,et al.  DAnTE: a statistical tool for quantitative analysis of -omics data , 2008, Bioinform..

[6]  K. Barnhart,et al.  Biomarkers for ectopic pregnancy and pregnancy of unknown location. , 2013, Fertility and sterility.

[7]  Vineet Bafna,et al.  Accurate Mass Spectrometry Based Protein Quantification via Shared Peptides , 2012, J. Comput. Biol..

[8]  J. Pounds,et al.  Data merging for integrated microarray and proteomic analysis. , 2006, Briefings in functional genomics & proteomics.

[9]  Vladislav A Petyuk,et al.  Mass spectrometry for translational proteomics: progress and clinical implications , 2012, Genome Medicine.

[10]  Jenny Forshed Protein quantification by peptide quality control (PQPQ) of shotgun proteomics data. , 2013, Methods in molecular biology.

[11]  Ronald J Moore,et al.  Chemically etched open tubular and monolithic emitters for nanoelectrospray ionization mass spectrometry. , 2006, Analytical chemistry.

[12]  E. Petricoin,et al.  The role of proteomics in prostate cancer research: biomarker discovery and validation. , 2013, Clinical biochemistry.

[13]  Nikola Tolić,et al.  PRISM: A data management system for high‐throughput proteomics , 2006, Proteomics.

[14]  J. Rappsilber,et al.  Self‐made frits for nanoscale columns in proteomics , 2005, Proteomics.

[15]  Navdeep Jaitly,et al.  Decon2LS: An open-source software package for automated processing and visualization of high resolution mass spectrometry data , 2009, BMC Bioinformatics.

[16]  Joel G Pounds,et al.  A comparative analysis of computational approaches to relative protein quantification using peptide peak intensities in label‐free LC‐MS proteomics experiments , 2013, Proteomics.

[17]  Joel G. Pounds,et al.  Improved quality control processing of peptide-centric LC-MS proteomics data , 2011, Bioinform..

[18]  Ronald J. Moore,et al.  Mouse-specific tandem IgY7-SuperMix immunoaffinity separations for improved LC-MS/MS coverage of the plasma proteome. , 2009, Journal of proteome research.

[19]  D. Tabb,et al.  Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. , 2007, Journal of proteome research.

[20]  S. Carr,et al.  Quantitative analysis of peptides and proteins in biomedicine by targeted mass spectrometry , 2013, Nature Methods.

[21]  James P. Reilly,et al.  A computational approach toward label-free protein quantification using predicted peptide detectability , 2006, ISMB.

[22]  R. Baxter,et al.  Breast cancer biomarkers: proteomic discovery and translation to clinically relevant assays , 2012, Expert review of proteomics.

[23]  A. Levey,et al.  Oxidative Modifications and Aggregation of Cu,Zn-Superoxide Dismutase Associated with Alzheimer and Parkinson Diseases* , 2005, Journal of Biological Chemistry.

[24]  R. Branca,et al.  Enhanced Information Output From Shotgun Proteomics Data by Protein Quantification and Peptide Quality Control (PQPQ)* , 2011, Molecular & Cellular Proteomics.

[25]  Quanhu Sheng,et al.  A Bayesian Approach to Protein Inference Problem in Shotgun Proteomics , 2008, RECOMB.

[26]  Joel G. Pounds,et al.  Combined Statistical Analyses of Peptide Intensities and Peptide Occurrences Improves Identification of Significant Peptides from MS-Based Proteomics Data , 2010, Journal of proteome research.

[27]  Sylvie Huet,et al.  Including shared peptides for estimating protein abundances: A significant improvement for quantitative proteomics , 2012, Proteomics.

[28]  Joel G Pounds,et al.  A statistical selection strategy for normalization procedures in LC‐MS proteomics experiments through dataset‐dependent ranking of normalization scaling factors , 2011, Proteomics.

[29]  Ronald J Moore,et al.  Fully automated four-column capillary LC-MS system for maximizing throughput in proteomic analyses. , 2008, Analytical chemistry.

[30]  Richard D. Smith,et al.  Advances in proteomics data analysis and display using an accurate mass and time tag approach. , 2006, Mass spectrometry reviews.

[31]  Navdeep Jaitly,et al.  VIPER: an advanced software package to support high-throughput LC-MS peptide identification , 2007, Bioinform..

[32]  Lloyd M. Smith,et al.  Proteoform: a single term describing protein complexity , 2013, Nature Methods.

[33]  R. Lyman Ott.,et al.  An introduction to statistical methods and data analysis , 1977 .

[34]  William Stafford Noble,et al.  A review of statistical methods for protein identification using tandem mass spectrometry. , 2012, Statistics and its interface.