Systematic Evaluation of Normalization Methods for Glycomics Data Based on Performance of Network Inference

Glycomics measurements, like all other high-throughput technologies, are subject to technical variation due to fluctuations in the experimental conditions. The removal of this non-biological signal from the data is referred to as normalization. Contrary to other omics data types, a systematic evaluation of normalization options for glycomics data has not been published so far. In this paper, we assess the quality of different normalization strategies for glycomics data with an innovative approach. It has been shown previously that Gaussian Graphical Models (GGMs) inferred from glycomics data are able to identify enzymatic steps in the glycan synthesis pathways in a data-driven fashion. Based on this finding, we here quantify the quality of a given normalization method according to how well a GGM inferred from the respective normalized data reconstructs known synthesis reactions in the glycosylation pathway. The method therefore exploits a biological measure of goodness. We analyzed 23 different normalization combinations applied to six large-scale glycomics cohorts across three experimental platforms (LC-ESI-MS, UHPLC-FLD and MALDI-FTICR-MS). Based on our results, we recommend normalizing glycan data using the ‘Probabilistic Quotient’ method followed by log-transformation, irrespective of the measurement platform.

[1]  I. Rudan,et al.  Comparative Performance of Four Methods for High-throughput Glycosylation Analysis of Immunoglobulin G in Genetic and Epidemiological Research , 2014, Molecular & Cellular Proteomics.

[2]  Mary C. Phipps,et al.  Inequalities between hypergeometric tails , 2003, Adv. Decis. Sci..

[3]  M. Perola,et al.  IgG Glycome in Colorectal Cancer , 2016, Clinical Cancer Research.

[4]  Eugene Seneta,et al.  On the Comparison of Two Observed Frequencies , 2001 .

[5]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[6]  Kunihiko Kaneko,et al.  Ubiquity of log-normal distributions in intra-cellular reaction dynamics , 2005, Biophysics.

[7]  Gregory B. Gloor,et al.  Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data. , 2016, Canadian journal of microbiology.

[8]  Robert J. Moon,et al.  Transforming Glycoscience: A Roadmap for the Future , 2012 .

[9]  J. Kyle,et al.  Dietary Flavonoids and the Risk of Colorectal Cancer , 2007, Cancer Epidemiology Biomarkers & Prevention.

[10]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[11]  Naoyuki Taniguchi,et al.  Handbook of Glycosyltransferases and Related Genes , 2002, Springer Japan.

[12]  Nicolle H. Packer,et al.  Relative versus absolute quantitation in disease glycomics , 2015, Proteomics. Clinical applications.

[13]  Stephen J. Callister,et al.  Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. , 2006, Journal of proteome research.

[14]  Marie-Paule Lefranc,et al.  Human immunoglobulin allotypes , 2009, mAbs.

[15]  R. Spang,et al.  State-of-the art data normalization methods improve NMR-based metabolomic analysis , 2011, Metabolomics.

[16]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[17]  J. Aitchison,et al.  Logratio Analysis and Compositional Distance , 2000 .

[18]  Hongzhe Li,et al.  A Logistic Normal Multinomial Regression Model for Microbiome Compositional Data Analysis , 2013, Biometrics.

[19]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[20]  M. Balbín,et al.  DNA sequences specific for Caucasian G3m(b) and (g) allotypes: allotyping at the genomic level , 2004, Immunogenetics.

[21]  Feng Zhu,et al.  Performance Evaluation and Online Realization of Data-driven Normalization Methods Used in LC/MS based Untargeted Metabolomics Analysis , 2016, Scientific Reports.

[22]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[23]  K. Hansen,et al.  Removing technical variability in RNA-seq data using conditional quantile normalization , 2012, Biostatistics.

[24]  Pauline M. Rudd,et al.  High Throughput Isolation and Glycosylation Analysis of IgG–Variability and Heritability of the IgG Glycome in Three Isolated Human Populations* , 2011, Molecular & Cellular Proteomics.

[25]  Laura L. Elo,et al.  A systematic evaluation of normalization methods in quantitative label-free proteomics , 2016, Briefings Bioinform..

[26]  Jean M. Macklaim,et al.  Microbiome Datasets Are Compositional: And This Is Not Optional , 2017, Front. Microbiol..

[27]  Christian Gieger,et al.  Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies , 2018, Metabolomics.

[28]  Marian Beekman,et al.  Evidence of genetic enrichment for exceptional survival using a family approach: the Leiden Longevity Study , 2006, European Journal of Human Genetics.

[29]  D. Jones,et al.  Adjustments and measures of differential expression for microarray data , 2002, Bioinform..

[30]  Arief Gusnanto,et al.  Discussion on the paper Statistical Contributions to Bioinformatics: Design, Modeling, Structure Learning, and Integration , 2018 .

[31]  Rosanda Mulić,et al.  "10001 Dalmatians:" Croatia launches its national biobank. , 2009, Croatian medical journal.

[32]  Jeanine J. Houwing-Duistermaat,et al.  Discussion on the paper ‘Statistical contributions to bioinformatics: Design, modelling, structure learning and integration’ by Jeffrey S. Morris and Veerabhadran Baladandayuthapani , 2017 .

[33]  Rob Knight,et al.  Analysis of composition of microbiomes: a novel method for studying microbial composition , 2015, Microbial ecology in health and disease.

[34]  H. Senn,et al.  Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. , 2006, Analytical chemistry.

[35]  J. Aitchison Logratios and Natural Laws in Compositional Data Analysis , 1999 .

[36]  Fabian J Theis,et al.  Network inference from glycoproteomics data reveals new reactions in the IgG glycosylation pathway , 2017, Nature Communications.

[37]  J. Aitchison,et al.  Compositional Data Analysis: Where Are We and Where Should We Be Heading? , 2003 .

[38]  Gerald W. Hart,et al.  Handbook of Glycosyltransferases and Related Genes , 2014, Springer Japan.

[39]  Kieu Trinh Do,et al.  Phenotype-driven identification of modules in a hierarchical map of multifluid metabolic correlations , 2017, npj Systems Biology and Applications.

[40]  Regina Berretta,et al.  Evaluation of Different Normalization and Analysis Procedures for Illumina Gene Expression Microarray Data Involving Small Changes , 2013, Microarrays.

[41]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[42]  Matthew C. B. Tsilimigras,et al.  Compositional data analysis of the microbiome: fundamentals, tools, and challenges. , 2016, Annals of epidemiology.

[43]  Hongzhe Li,et al.  A two-part mixed-effects model for analyzing longitudinal microbiome compositional data , 2016, Bioinform..

[44]  Fabian J. Theis,et al.  Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data , 2011, BMC Systems Biology.

[45]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[46]  Manfred Wuhrer,et al.  Human Plasma N-glycosylation as Analyzed by Matrix-Assisted Laser Desorption/Ionization-Fourier Transform Ion Cyclotron Resonance-MS Associates with Markers of Inflammation and Metabolic Health* , 2016, Molecular & Cellular Proteomics.

[47]  Magnus Palmblad,et al.  Fc specific IgG glycosylation profiling by robust nano-reverse phase HPLC-MS using a sheath-flow ESI sprayer interface. , 2012, Journal of proteomics.

[48]  Richard Routledge Fisher's Exact Test , 2005 .

[49]  Division on Earth,et al.  Transforming Glycoscience: A Roadmap for the Future , 2012 .

[50]  A. L. Koch,et al.  The logarithm in biology. 1. Mechanisms generating the log-normal distribution exactly. , 1966, Journal of theoretical biology.

[51]  Anru R. Zhang,et al.  Regression Analysis for Microbiome Compositional Data , 2016, 1603.00974.

[52]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.