Normalization and integration of large-scale metabolomics data using support vector regression

IntroductionUntargeted metabolomics studies for biomarker discovery often have hundreds to thousands of human samples. Data acquisition of large-scale samples has to be divided into several batches and may span from months to as long as several years. The signal drift of metabolites during data acquisition (intra- and inter-batch) is unavoidable and is a major confounding factor for large-scale metabolomics studies.ObjectivesWe aim to develop a data normalization method to reduce unwanted variations and integrate multiple batches in large-scale metabolomics studies prior to statistical analyses.MethodsWe developed a machine learning algorithm-based method, support vector regression (SVR), for large-scale metabolomics data normalization and integration. An R package named MetNormalizer was developed and provided for data processing using SVR normalization.ResultsAfter SVR normalization, the portion of metabolite ion peaks with relative standard deviations (RSDs) less than 30 % increased to more than 90 % of the total peaks, which is much better than other common normalization methods. The reduction of unwanted analytical variations helps to improve the performance of multivariate statistical analyses, both unsupervised and supervised, in terms of classification and prediction accuracy so that subtle metabolic changes in epidemiological studies can be detected.ConclusionSVR normalization can effectively remove the unwanted intra- and inter-batch variations, and is much better than other common normalization methods.

[1]  Krzysztof Fujarewicz,et al.  A multi-gene approach to differentiate papillary thyroid carcinoma from benign lesions: gene selection using support vector machines with bootstrapping , 2007, Endocrine-related cancer.

[2]  S. Neumann,et al.  CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. , 2012, Analytical chemistry.

[3]  Steffen Neumann,et al.  Highly sensitive feature detection for high resolution LC/MS , 2008, BMC Bioinformatics.

[4]  Emily L. Kang,et al.  Computational and statistical analysis of metabolomics data , 2015, Metabolomics.

[5]  Luigi Atzori,et al.  Metabolomics as a tool for cardiac research , 2011, Nature Reviews Cardiology.

[6]  J. Lindon,et al.  Systems biology: Metabonomics , 2008, Nature.

[7]  Johann A. Gagnon-Bartsch,et al.  Statistical methods for handling unwanted variation in metabolomics data. , 2015, Analytical chemistry.

[8]  Frans M van der Kloet,et al.  Analytical error reduction using single point calibration for accurate and precise metabolomic phenotyping. , 2009, Journal of proteome research.

[9]  Kazuki Saito,et al.  Compensation for systematic cross-contribution improves normalization of mass spectrometry based metabolomics data. , 2009, Analytical chemistry.

[10]  Advin K. Mathew METABOLOMICS: THE APOGEE OF THE OMICS TRILOGY , 2013 .

[11]  David Broadhurst,et al.  The importance of experimental design and QC samples in large-scale and MS-driven untargeted metabolomic studies of humans. , 2012, Bioanalysis.

[12]  T. Shaler,et al.  Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. , 2003, Analytical chemistry.

[13]  A. Smilde,et al.  Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation. , 2006, Analytical chemistry.

[14]  Joshua D. Knowles,et al.  Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry , 2011, Nature Protocols.

[15]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.

[16]  I. Kurland,et al.  Advantages of tandem LC-MS for the rapid assessment of tissue-specific metabolic complexity using a pentafluorophenylpropyl stationary phase. , 2011, Journal of proteome research.

[17]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[18]  T. Speed,et al.  Normalizing and integrating metabolomics data. , 2012, Analytical chemistry.

[19]  O. Fiehn Metabolomics – the link between genotypes and phenotypes , 2004, Plant Molecular Biology.

[20]  G. Siuzdak,et al.  Innovation: Metabolomics: the apogee of the omics trilogy , 2012, Nature Reviews Molecular Cell Biology.

[21]  Corey D. DeHaven,et al.  Integrated, nontargeted ultrahigh performance liquid chromatography/electrospray ionization tandem mass spectrometry platform for the identification and relative quantification of the small-molecule complement of biological systems. , 2009, Analytical chemistry.

[22]  Zongwei Cai,et al.  LC-MS-based urinary metabolite signatures in idiopathic Parkinson's disease. , 2015, Journal of proteome research.

[23]  Derick R. Peterson,et al.  Plasma phospholipids identify antecedent memory impairment in older adults , 2014, Nature Medicine.

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[25]  Peter Kraft,et al.  Elevated circulating branched chain amino acids are an early event in pancreatic adenocarcinoma development , 2014, Nature Medicine.

[26]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[27]  T. Ebbels,et al.  Optimized preprocessing of ultra-performance liquid chromatography/mass spectrometry urinary metabolic profiles for improved information recovery. , 2011, Analytical chemistry.

[28]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[29]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[30]  R. Brereton,et al.  Support vector machines for classification and regression. , 2010, The Analyst.

[31]  J. Rabinowitz,et al.  Systems biology: Metabolite turns master regulator , 2013, Nature.

[32]  Matej Oresic,et al.  Normalization method for metabolomics data using optimal selection of multiple internal standards , 2007, BMC Bioinformatics.

[33]  T. Ebbels,et al.  Optimizing the use of quality control samples for signal drift correction in large-scale urine metabolic profiling studies. , 2012, Analytical chemistry.

[34]  V. Mootha,et al.  Metabolite profiles and the risk of developing diabetes , 2011, Nature Medicine.

[35]  D. N. Perkins,et al.  Proteomic profiling using mass spectrometry – does normalising by total ion current potentially mask some biological differences? , 2008, Proteomics.

[36]  Jennifer Taylor Metabolite profiles and the risk of cardiometabolic disease. , 2014, European heart journal.

[37]  Kyoungmi Kim,et al.  Metabolomics in the study of kidney diseases , 2012, Nature Reviews Nephrology.

[38]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[39]  G. Siuzdak,et al.  Metabolomics annotates ABHD3 as a physiologic regulator of medium-chain phospholipids , 2011, Nature chemical biology.

[40]  Ralf Tautenhahn,et al.  Metabolomics implicates altered sphingolipids in chronic pain of neuropathic origin. , 2012, Nature chemical biology.

[41]  C. Kuo,et al.  Batch Normalizer: a fast total abundance regression calibration method to simultaneously adjust batch and injection order effects in liquid chromatography/time-of-flight mass spectrometry-based metabolomics data and comparison with current calibration methods. , 2013, Analytical chemistry.

[42]  Joachim Selbig,et al.  Metabolite fingerprinting: detecting biological features by independent component analysis , 2004, Bioinform..

[43]  Alexander G. Gray,et al.  Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines , 2009, BMC Bioinformatics.

[44]  Jenny Forshed,et al.  Multivariate meta-analysis of proteomics data from human prostate and colon tumours , 2010, BMC Bioinformatics.

[45]  Gordana Ivosev,et al.  Instrumental and experimental effects in LC-MS-based metabolomics. , 2008, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.