Mixture model normalization for non-targeted gas chromatography/mass spectrometry metabolomics data

BackgroundMetabolomics offers a unique integrative perspective for health research, reflecting genetic and environmental contributions to disease-related phenotypes. Identifying robust associations in population-based or large-scale clinical studies demands large numbers of subjects and therefore sample batching for gas-chromatography/mass spectrometry (GC/MS) non-targeted assays. When run over weeks or months, technical noise due to batch and run-order threatens data interpretability. Application of existing normalization methods to metabolomics is challenged by unsatisfied modeling assumptions and, notably, failure to address batch-specific truncation of low abundance compounds.ResultsTo curtail technical noise and make GC/MS metabolomics data amenable to analyses describing biologically relevant variability, we propose mixture model normalization (mixnorm) that accommodates truncated data and estimates per-metabolite batch and run-order effects using quality control samples. Mixnorm outperforms other approaches across many metrics, including improved correlation of non-targeted and targeted measurements and superior performance when metabolite detectability varies according to batch. For some metrics, particularly when truncation is less frequent for a metabolite, mean centering and median scaling demonstrate comparable performance to mixnorm.ConclusionsWhen quality control samples are systematically included in batches, mixnorm is uniquely suited to normalizing non-targeted GC/MS metabolomics data due to explicit accommodation of batch effects, run order and varying thresholds of detectability. Especially in large-scale studies, normalization is crucial for drawing accurate conclusions from non-targeted GC/MS metabolomics data.

[1]  T. Ebbels,et al.  Optimizing the use of quality control samples for signal drift correction in large-scale urine metabolic profiling studies. , 2012, Analytical chemistry.

[2]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[3]  T. Ebbels,et al.  Optimized preprocessing of ultra-performance liquid chromatography/mass spectrometry urinary metabolic profiles for improved information recovery. , 2011, Analytical chemistry.

[4]  F. Priego-Capote,et al.  Influence of the collection tube on metabolomic changes in serum and plasma. , 2016, Talanta.

[5]  M. Jecht The Hyperglycemia and Adverse Pregnancy Outcome (HAPO) Study , 2002 .

[6]  William L. Lowe,et al.  Metabomxtr: an R package for mixture-model analysis of non-targeted metabolomics data , 2014, Bioinform..

[7]  L H Moulton,et al.  A mixture model with detection limits for regression analyses of antibody response to vaccine. , 1995, Biometrics.

[8]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[9]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[10]  U. Roessner,et al.  Technical advance: simultaneous analysis of metabolites in potato tuber by gas chromatography-mass spectrometry. , 2000, The Plant journal : for cell and molecular biology.

[11]  S. Stein,et al.  Deconvolution gas chromatography/mass spectrometry of urinary organic acids--potential for pattern recognition and automated identification of metabolic disorders. , 1999, Rapid communications in mass spectrometry : RCM.

[12]  A. Dyer,et al.  Metabolomics Reveals Broad-Scale Metabolic Perturbations in Hyperglycemic Mothers During Pregnancy , 2013, Diabetes Care.

[13]  A. Dyer,et al.  Hyperglycemia and Adverse Pregnancy Outcome (HAPO) Study , 2008, Diabetes.

[14]  Masaru Tomita,et al.  Effects of processing and storage conditions on charged metabolomic profiles in blood , 2015, Electrophoresis.

[15]  Ute Roessner,et al.  Simultaneous analysis of metabolites in potato tuber by gas chromatography-mass spectrometry. , 2000 .

[16]  Joshua D. Knowles,et al.  Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry , 2011, Nature Protocols.

[17]  J. Huynh,et al.  A systematic review of metabolite profiling in gestational diabetes mellitus , 2014, Diabetologia.

[18]  David S. Wishart,et al.  MetaboAnalyst 3.0—making metabolomics more meaningful , 2015, Nucleic Acids Res..

[19]  Mark R. Viant,et al.  Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline , 2011, Metabolomics.

[20]  Masaru Tomita,et al.  Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis , 2012, Current bioinformatics.

[21]  Gunnel Tybring,et al.  Metabolomic Quality Assessment of EDTA Plasma and Serum Samples. , 2016, Biopreservation and biobanking.

[22]  Kengo Kinoshita,et al.  Establishment of Protocols for Global Metabolomics by LC-MS for Biomarker Discovery , 2016, PloS one.

[23]  William L. Lowe,et al.  Metabolic Networks and Metabolites Underlie Associations Between Maternal Glucose During Pregnancy and Newborn Size at Birth , 2016, Diabetes.

[24]  O. Fiehn,et al.  FiehnLib: mass spectral and retention index libraries for metabolomics based on quadrupole and time-of-flight gas chromatography/mass spectrometry. , 2009, Analytical chemistry.

[25]  Alexander Goesmann,et al.  MeltDB 2.0–advances of the metabolomics software system , 2013, Bioinform..

[26]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[27]  C. Kim,et al.  Metabolomic analysis reveals amino-acid responses to an oral glucose tolerance test in women with prior history of gestational diabetes mellitus , 2014, Journal of clinical & translational endocrinology.

[28]  T. Speed,et al.  Normalizing and integrating metabolomics data. , 2012, Analytical chemistry.

[29]  David Broadhurst,et al.  The importance of experimental design and QC samples in large-scale and MS-driven untargeted metabolomic studies of humans. , 2012, Bioanalysis.

[30]  C. Kuo,et al.  Batch Normalizer: a fast total abundance regression calibration method to simultaneously adjust batch and injection order effects in liquid chromatography/time-of-flight mass spectrometry-based metabolomics data and comparison with current calibration methods. , 2013, Analytical chemistry.

[31]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[32]  Sigridur Sia Jonsdottir,et al.  Hyperglycemia and Adverse Pregnancy Outcomes , 2009 .

[33]  Joachim Selbig,et al.  pcaMethods - a bioconductor package providing PCA methods for incomplete data , 2007, Bioinform..

[34]  L. Svetkey,et al.  The Study of the Effects of Diet on Metabolism and Nutrition (STEDMAN) weight loss project: Rationale and design. , 2005, Contemporary clinical trials.

[35]  Yuliya V. Karpievitch,et al.  Metabolomics Data Normalization with EigenMS , 2014, PloS one.

[36]  Daniel Jacob,et al.  Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics , 2014, Bioinform..

[37]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[38]  Charmion Cruickshank-Quinn,et al.  MSPrep - Summarization, normalization and diagnostics for processing of mass spectrometry-based metabolomic data , 2014, Bioinform..

[39]  Piotr S. Gromski,et al.  Influence of Missing Values Substitutes on Multivariate Analysis of Metabolomics Data , 2014, Metabolites.

[40]  L. Svetkey,et al.  The STEDMAN project: biophysical, biochemical and metabolic effects of a behavioral weight loss intervention during weight loss, maintenance, and regain. , 2009, Omics : a journal of integrative biology.

[41]  Matej Oresic,et al.  Normalization method for metabolomics data using optimal selection of multiple internal standards , 2007, BMC Bioinformatics.