Statistical methods for handling unwanted variation in metabolomics data.

Metabolomics experiments are inevitably subject to a component of unwanted variation, due to factors such as batch effects, long runs of samples, and confounding biological variation. Although the removal of this unwanted variation is a vital step in the analysis of metabolomics data, it is considered a gray area in which there is a recognized need to develop a better understanding of the procedures and statistical methods required to achieve statistically relevant optimal biological outcomes. In this paper, we discuss the causes of unwanted variation in metabolomics experiments, review commonly used metabolomics approaches for handling this unwanted variation, and present a statistical approach for the removal of unwanted variation to obtain normalized metabolomics data. The advantages and performance of the approach relative to several widely used metabolomics normalization approaches are illustrated through two metabolomics studies, and recommendations are provided for choosing and assessing the most suitable normalization method for a given metabolomics experiment. Software for the approach is made freely available.

[1]  Terence P. Speed,et al.  Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed , 2012, Biostatistics.

[2]  Lars-Göran Johansson,et al.  On Scientific Data , 2016 .

[3]  Jennifer A Kirwan,et al.  Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control , 2014, Scientific Data.

[4]  C. Barbas,et al.  Metabolomics in cancer biomarker discovery: current trends and future perspectives. , 2014, Journal of pharmaceutical and biomedical analysis.

[5]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[6]  J. Shaw,et al.  Plasma lipid profiling in a large population-based cohort[S] , 2013, Journal of Lipid Research.

[7]  C. Kuo,et al.  Batch Normalizer: a fast total abundance regression calibration method to simultaneously adjust batch and injection order effects in liquid chromatography/time-of-flight mass spectrometry-based metabolomics data and comparison with current calibration methods. , 2013, Analytical chemistry.

[8]  Dianne Cook,et al.  The Generalized Pairs Plot , 2013 .

[9]  P. Guedes de Pinho,et al.  Metabolomics analysis for biomarker discovery: advances and challenges. , 2012, Current medicinal chemistry.

[10]  David P. De Souza,et al.  Cross-Platform Urine Metabolomics of Experimental Hyperglycemia in Type 2 Diabetes , 2013 .

[11]  Terence P Speed,et al.  Statistical analysis of metabolomics data. , 2013, Methods in molecular biology.

[12]  T. Speed,et al.  Normalizing and integrating metabolomics data. , 2012, Analytical chemistry.

[13]  Johann A. Gagnon-Bartsch,et al.  Using control genes to correct for unwanted variation in microarray data. , 2012, Biostatistics.

[14]  T. Ebbels,et al.  Optimizing the use of quality control samples for signal drift correction in large-scale urine metabolic profiling studies. , 2012, Analytical chemistry.

[15]  Joshua D. Knowles,et al.  Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry , 2011, Nature Protocols.

[16]  Sandra Castillo,et al.  Liquid chromatography-mass spectrometry (LC-MS)-based lipidomics for studies of body fluids and tissues. , 2011, Methods in molecular biology.

[17]  David Heckerman,et al.  Correction for hidden confounders in the genetic analysis of gene expression , 2010, Proceedings of the National Academy of Sciences.

[18]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[19]  I. Wilson,et al.  Methodological considerations in the development of HPLC-MS methods for the analysis of rodent plasma for metabonomic studies. , 2009, Molecular bioSystems.

[20]  Kazuki Saito,et al.  Compensation for systematic cross-contribution improves normalization of mass spectrometry based metabolomics data. , 2009, Analytical chemistry.

[21]  Joshua D. Knowles,et al.  Development of a robust and repeatable UPLC-MS method for the long-term metabolomic study of human serum. , 2009, Analytical chemistry.

[22]  I. Wilson,et al.  Evaluation of the repeatability of ultra-performance liquid chromatography-TOF-MS for global metabolic profiling of human urine samples. , 2008, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[23]  D. N. Perkins,et al.  Proteomic profiling using mass spectrometry – does normalising by total ion current potentially mask some biological differences? , 2008, Proteomics.

[24]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[25]  Matej Oresic,et al.  Normalization method for metabolomics data using optimal selection of multiple internal standards , 2007, BMC Bioinformatics.

[26]  J. Lindon,et al.  Scaling and normalization effects in NMR spectroscopic metabonomic data sets. , 2006, Analytical chemistry.

[27]  A. Smilde,et al.  Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation. , 2006, Analytical chemistry.

[28]  Joachim Selbig,et al.  Metabolite fingerprinting: detecting biological features by independent component analysis , 2004, Bioinform..

[29]  P. Oefner,et al.  Journal of Chromatography B: Analytical Technologies in the Biomedical and Life Sciences. Preface , 2002 .

[30]  David W. Scott The New S Language , 1990 .

[31]  C. R. Henderson Applications of linear models in animal breeding , 1984 .

[32]  T. S. West Analytical Chemistry , 1969, Nature.

[33]  J. D. Morrison,et al.  Computer methods in analytical mass spectrometry. Identification of an unknown compound in a catalog , 1968 .

[34]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[35]  A. Heras Molecular BioSystems , 2015 .