Comparison of normalization methods in clinical research applications of mass spectrometry-based proteomics

Large-scale proteomic studies have to deal with unwanted variability, especially when samples originate from different centers and/or multiple analytical batches are needed. Such variability is typically added throughout all the steps of a clinical study, from biological sample collection and storage, sample preparation, spectral data acquisition, to peptide/protein quantification. In order to remove such diverse variability, normalization of the protein data is performed. There are several published works comparing normalization methods in the -omics field, but reports focusing on proteomic data generated with mass spectrometry (MS) are much fewer. Additionally, most of these studies have only dealt with small datasets. As a case study, we focused on the normalization of a large quantitative MS-based proteomic dataset obtained with isobaric tandem-mass tagging (TMT) of plasma samples from an overweight and obese pan-European cohort. Different normalization methods were evaluated, namely, standardization, quantile sample, removal of unwanted variation (RUV), ComBat, mean and median centering, and single standard normalization; some of these methods are generic while others have been specifically created to deal with genomic or metabolomic data. We checked how relationships between proteins and clinical variables were impacted after normalizing the data with the different methods. We compared the normalized datasets using an array of diagnostic plots. Some methods were well adapted for this particular large-scale shotgun proteomic dataset of human plasma samples. In particular, quantile sample normalization, RUV, mean and median centering showed very good performance, while quantile protein normalization provided results of inferior quality than those obtained with unnormalized data.

[1]  J. Jameson,et al.  Endocrinology adult and pediatric , 2010 .

[2]  C. Richart,et al.  Association of Retinol-Binding Protein-4 (RBP4) with Lipid Parameters in Obese Women , 2010, Obesity surgery.

[3]  A. Hofman,et al.  Serum levels of pregnancy zone protein are elevated in presymptomatic Alzheimer's disease. , 2011, Journal of proteome research.

[4]  Y. Shah,et al.  Iron homeostasis in the liver. , 2013, Comprehensive Physiology.

[5]  Bart J. A. Mertens,et al.  Transformation, Normalization, and Batch Effect in the Analysis of Mass Spectrometry Data for Omics Studies , 2016, 1606.05360.

[6]  Loïc Dayon,et al.  Proteomic Biomarker Discovery in 1000 Human Plasma Samples with Mass Spectrometry. , 2016, Journal of proteome research.

[7]  C. Lavie,et al.  The relationship between obesity and coronary artery disease. , 2014, Translational research : the journal of laboratory and clinical medicine.

[8]  Laura L. Elo,et al.  A systematic evaluation of normalization methods in quantitative label-free proteomics , 2016, Briefings Bioinform..

[9]  Matthias Mann,et al.  Proteomics reveals the effects of sustained weight loss on the human plasma proteome , 2016, Molecular systems biology.

[10]  R. Mukherjee,et al.  Profiling of gender-specific rat plasma proteins associated with susceptibility or resistance to diet-induced obesity. , 2012, Journal of proteomics.

[11]  Stephen J. Callister,et al.  Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. , 2006, Journal of proteome research.

[12]  M. Mann,et al.  A high confidence , manually validated human blood plasma protein reference set , 2008 .

[13]  M. Kussmann,et al.  Proteomics of Cerebrospinal Fluid: Throughput and Robustness Using a Scalable Automated Analysis Pipeline for Biomarker Discovery. , 2015, Analytical chemistry.

[14]  Andrew E. Jaffe,et al.  Bioinformatics Applications Note Gene Expression the Sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments , 2022 .

[15]  S. Gygi,et al.  Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  T. Griffin,et al.  Normalization of Mass Spectrometry Data (NOMAD) , 2017, bioRxiv.

[17]  J. Kaufman,et al.  Sex hormone-binding globulin regulation of androgen bioactivity in vivo: validation of the free hormone hypothesis , 2016, Scientific Reports.

[18]  Fredrik Levander,et al.  NormalyzerDE: Online Tool for Improved Normalization of Omics Expression Data and High-Sensitivity Differential Expression Analysis. , 2018, Journal of proteome research.

[19]  R. Mukherjee,et al.  Long chain acyl CoA synthetase 1 and gelsolin are oppositely regulated in adipogenesis and lipogenesis. , 2012, Biochemical and biophysical research communications.

[20]  Loïc Dayon,et al.  Obesity shows preserved plasma proteome in large independent clinical cohorts , 2018, Scientific Reports.

[21]  Dirk Valkenborg,et al.  This item is the archived peer-reviewed author-version of: CONSTANd : a normalization method for isobaric labeled spectra by constrained optimization , 2022 .

[22]  Tanya M. Teslovich,et al.  Common variants associated with plasma triglycerides and risk for coronary artery disease , 2013, Nature Genetics.

[23]  Marco Giordan,et al.  A Two-Stage Procedure for the Removal of Batch Effects in Microarray Studies , 2013, Statistics in Biosciences.

[24]  P. Devarajan,et al.  Effects of age and gender on reference levels of biomarkers comprising the pediatric Renal Activity Index for Lupus Nephritis (p-RAIL) , 2017, Pediatric Rheumatology.

[25]  L. Havekes,et al.  Role of ApoCs in lipoprotein metabolism: functional differences between ApoC1, ApoC2, and ApoC3. , 1999, Arteriosclerosis, thrombosis, and vascular biology.

[26]  J. Yates,et al.  Isobaric Labeling-Based Relative Quantification in Shotgun Proteomics , 2014, Journal of proteome research.

[27]  E. Hammer,et al.  Plasma protein absolute quantification by nano-LC Q-TOF UDMSE for clinical biomarker verification , 2017, Clujul medical.

[28]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[29]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[30]  Loïc Dayon,et al.  A Highly Automated Shotgun Proteomic Workflow: Clinical Scale and Robustness for Biomarker Discovery in Blood. , 2017, Methods in molecular biology.

[31]  Laura A. Buchanan,et al.  The obesity epidemic: challenges, health initiatives, and implications for gastroenterologists. , 2010, Gastroenterology & hepatology.

[32]  Jens M. Rick,et al.  Quantitative mass spectrometry in proteomics: a critical review , 2007, Analytical and bioanalytical chemistry.

[33]  M. Mann,et al.  A Proteomics Approach to the Protein Normalization Problem: Selection of Unvarying Proteins for MS-Based Proteomics and Western Blotting. , 2016, Journal of proteome research.

[34]  T. Fujita,et al.  Sexual dimorphism of the fifth component of mouse complement , 1984, The Journal of experimental medicine.

[35]  L. Trouw,et al.  Age and Sex-Associated Changes of Complement Activity and Complement Levels in a Healthy Caucasian Population , 2018, Front. Immunol..

[36]  Terence P. Speed,et al.  Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed , 2012, Biostatistics.

[37]  Fredrik Levander,et al.  Normalyzer: A Tool for Rapid Evaluation of Normalization Methods for Omics Data Sets , 2014, Journal of proteome research.

[38]  M. Daha,et al.  Sex matters: Systemic complement activity of female C57BL/6J and BALB/cJ mice is limited by serum terminal pathway components. , 2016, Molecular immunology.

[39]  H. Parving,et al.  Fibulin-1 is a marker for arterial extracellular matrix alterations in type 2 diabetes. , 2011, Clinical chemistry.

[40]  Johann A. Gagnon-Bartsch,et al.  Statistical methods for handling unwanted variation in metabolomics data. , 2015, Analytical chemistry.

[41]  B. Nilsson,et al.  The role of complement factor C3 in lipid metabolism. , 2015, Molecular immunology.

[42]  Julie A Simpson,et al.  NormalizeMets: assessing, selecting and implementing statistical methods for normalizing metabolomics data , 2018, Metabolomics : Official journal of the Metabolomic Society.

[43]  Tomasz Burzykowski,et al.  Evaluation of normalization methods to pave the way towards large-scale LC-MS-based metabolomics profiling experiments. , 2013, Omics : a journal of integrative biology.

[44]  R. Aebersold,et al.  Generating and navigating proteome maps using mass spectrometry , 2010, Nature Reviews Molecular Cell Biology.

[45]  J. Bernal,et al.  Thyroid hormone transporters—functions and clinical implications , 2015, Nature Reviews Endocrinology.