Performance Evaluation and Online Realization of Data-driven Normalization Methods Used in LC/MS based Untargeted Metabolomics Analysis

In untargeted metabolomics analysis, several factors (e.g., unwanted experimental & biological variations and technical errors) may hamper the identification of differential metabolic features, which requires the data-driven normalization approaches before feature selection. So far, ≥16 normalization methods have been widely applied for processing the LC/MS based metabolomics data. However, the performance and the sample size dependence of those methods have not yet been exhaustively compared and no online tool for comparatively and comprehensively evaluating the performance of all 16 normalization methods has been provided. In this study, a comprehensive comparison on these methods was conducted. As a result, 16 methods were categorized into three groups based on their normalization performances across various sample sizes. The VSN, the Log Transformation and the PQN were identified as methods of the best normalization performance, while the Contrast consistently underperformed across all sub-datasets of different benchmark data. Moreover, an interactive web tool comprehensively evaluating the performance of 16 methods specifically for normalizing LC/MS based metabolomics data was constructed and hosted at http://server.idrb.cqu.edu.cn/MetaPre/. In summary, this study could serve as a useful guidance to the selection of suitable normalization methods in analyzing the LC/MS based metabolomics data.

[1]  J. Ramírez,et al.  Comparison between Different Intensity Normalization Methods in 123I-Ioflupane Imaging for the Automatic Detection of Parkinsonism , 2015, PloS one.

[2]  Per E. Andrén,et al.  Development and Evaluation of Normalization Methods for Label-free Relative Quantification of Endogenous Peptides* , 2009, Molecular & Cellular Proteomics.

[3]  E. Thévenot,et al.  Analysis of the Human Adult Urinary Metabolome Variations with Age, Body Mass Index, and Gender by Implementing a Comprehensive Workflow for Univariate and OPLS Statistical Analyses. , 2015, Journal of proteome research.

[4]  Yanjin Chen,et al.  Meta-analysis of differentially expressed genes in osteosarcoma based on gene expression data , 2014, BMC Medical Genetics.

[5]  T. Ebbels,et al.  Improved analysis of multivariate data by variable stability scaling: application to NMR-based metabolic profiling , 2003 .

[6]  William Stafford Noble,et al.  Machine learning applications in genetics and genomics , 2015, Nature Reviews Genetics.

[7]  Tomasz Burzykowski,et al.  Evaluation of normalization methods to pave the way towards large-scale LC-MS-based metabolomics profiling experiments. , 2013, Omics : a journal of integrative biology.

[8]  Yue Wang,et al.  Gaussian process regression model for normalization of LC-MS data using scan-level information , 2013, Proteome Science.

[9]  Emmanuel Hatzakis,et al.  Noninvasive urinary metabolomic profiling identifies diagnostic and prognostic markers in lung cancer. , 2014, Cancer research.

[10]  Eoin Fahy,et al.  Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools , 2015, Nucleic Acids Res..

[11]  W. Weckwerth Metabolomics in systems biology. , 2003, Annual review of plant biology.

[12]  Erik Johansson,et al.  Using chemometrics for navigating in the large data sets of genomics, proteomics, and metabonomics (gpm) , 2004, Analytical and bioanalytical chemistry.

[13]  Yang Song,et al.  Therapeutic target database update 2012: a resource for facilitating target-oriented drug discovery , 2011, Nucleic Acids Res..

[14]  Jun Wang,et al.  Meta-Analysis of Genetic Programs between Idiopathic Pulmonary Fibrosis and Sarcoidosis , 2013, PloS one.

[15]  Jasper Engel,et al.  Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling , 2016, Metabolomics.

[16]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.

[17]  Laura L. Elo,et al.  A systematic evaluation of normalization methods in quantitative label-free proteomics , 2016, Briefings Bioinform..

[18]  R. Spang,et al.  State-of-the art data normalization methods improve NMR-based metabolomic analysis , 2011, Metabolomics.

[19]  T. Speed,et al.  Normalizing and integrating metabolomics data. , 2012, Analytical chemistry.

[20]  T. Bathen,et al.  Multivariate modeling and prediction of breast cancer prognostic factors using MR metabolomics. , 2010, Journal of proteome research.

[21]  Feng Zhu,et al.  Exploring the Inhibitory Mechanism of Approved Selective Norepinephrine Reuptake Inhibitors and Reboxetine Enantiomers by Molecular Dynamics Study , 2016, Scientific Reports.

[22]  J. Sanabria,et al.  Metabolomic Analysis of Liver Tissue from the VX2 Rabbit Model of Secondary Liver Tumors , 2014, HPB surgery : a world journal of hepatic, pancreatic and biliary surgery.

[23]  A. Smilde,et al.  Fusion of mass spectrometry-based metabolomics data. , 2005, Analytical chemistry.

[24]  S. Knudsen,et al.  A new non-linear normalization method for reducing variability in DNA microarray experiments , 2002, Genome Biology.

[25]  D. Ballabio,et al.  Classification tools in chemistry. Part 1: linear models. PLS-DA , 2013 .

[26]  I. Wilson,et al.  LC-MS-based methodology for global metabolite profiling in metabonomics/metabolomics , 2008 .

[27]  Daniel Jacob,et al.  Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics , 2014, Bioinform..

[28]  Joshua D. Knowles,et al.  Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry , 2011, Nature Protocols.

[29]  David S. Wishart,et al.  MetaboAnalyst 3.0—making metabolomics more meaningful , 2015, Nucleic Acids Res..

[30]  Pietro Franceschi,et al.  A benchmark spike‐in data set for biomarker identification in metabolomics , 2012 .

[31]  Mahlet G Tadesse,et al.  Utilization of metabolomics to identify serum biomarkers for hepatocellular carcinoma in patients with liver cirrhosis. , 2012, Analytica chimica acta.

[32]  Sylvia Tippmann,et al.  Programming tools: Adventures with R , 2014, Nature.

[33]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[34]  Xinmin Yin,et al.  Metabolomic analysis of the effects of polychlorinated biphenyls in nonalcoholic fatty liver disease. , 2012, Journal of proteome research.

[35]  John T. Wei,et al.  Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression , 2009, Nature.

[36]  Magnus Åstrand,et al.  Contrast Normalization of Oligonucleotide Arrays , 2003, J. Comput. Biol..

[37]  M. Tadesse,et al.  A bayesian based functional mixed-effects model for analysis of LC-MS data , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[38]  Yupeng Zhao,et al.  Metabolomic analyses of banana during postharvest senescence by 1H-high resolution-NMR. , 2017, Food chemistry.

[39]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[40]  Alice H. Lichtenstein,et al.  Plasma Phospholipid Fatty Acid Biomarkers of Dietary Fat Quality and Endogenous Metabolism Predict Coronary Heart Disease Risk: A Nested Case‐Control Study Within the Women's Health Initiative Observational Study , 2014, Journal of the American Heart Association.

[41]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[42]  Kazuki Saito,et al.  Integrated omics approaches in plant systems biology. , 2009, Current opinion in chemical biology.

[43]  David M. Rocke,et al.  Discrimination models using variance-stabilizing transformation of metabolomic NMR data. , 2004, Omics : a journal of integrative biology.

[44]  M. Sawyer,et al.  Urine Metabolite Analysis Offers Potential Early Diagnosis of Ovarian and Breast Cancers , 2010, Clinical Cancer Research.

[45]  Terry M. Therneau,et al.  Faster cyclic loess: normalizing RNA arrays via linear models , 2004, Bioinform..

[46]  Matthias Scholz,et al.  MetaDB a Data Processing Workflow in Untargeted MS-Based Metabolomics Experiments , 2014, Front. Bioeng. Biotechnol..

[47]  Johann A. Gagnon-Bartsch,et al.  Statistical methods for handling unwanted variation in metabolomics data. , 2015, Analytical chemistry.

[48]  Xin Lu,et al.  A data preprocessing strategy for metabolomics to reduce the mask effect in data analysis , 2015, Front. Mol. Biosci..

[49]  Adam P. Arkin,et al.  Interactive XCMS Online: Simplifying Advanced Metabolomic Data Processing and Subsequent Statistical Analyses , 2014, Analytical chemistry.

[50]  Charmion Cruickshank-Quinn,et al.  MSPrep - Summarization, normalization and diagnostics for processing of mass spectrometry-based metabolomic data , 2014, Bioinform..

[51]  Feng Zhu,et al.  Comparison of FDA Approved Kinase Targets to Clinical Trial Ones: Insights from Their System Profiles and Drug-Target Interaction Networks , 2016, BioMed research international.

[52]  M. Milburn,et al.  Harnessing the Power of the Immune System to Target Cancer , 2013 .

[53]  S. Clarke,et al.  Pharmacometabonomic Profiling as a Predictor of Toxicity in Patients with Inoperable Colorectal Cancer Treated with Capecitabine , 2011, Clinical Cancer Research.

[54]  Frans M van der Kloet,et al.  Analytical error reduction using single point calibration for accurate and precise metabolomic phenotyping. , 2009, Journal of proteome research.

[55]  Rork Kuick,et al.  Molecular profiling of pancreatic adenocarcinoma and chronic pancreatitis identifies multiple genes differentially regulated in pancreatic cancer. , 2003, Cancer research.

[56]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[58]  Asaph Aharoni,et al.  Evaluation of peak picking quality in LC-MS metabolomics data. , 2010, Analytical chemistry.

[59]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[60]  Rima Kaddurah-Daouk,et al.  Metabolomics: A Global Biochemical Approach to the Study of Central Nervous System Diseases , 2009, Neuropsychopharmacology.

[61]  Tytus D. Mak,et al.  Selective paired ion contrast analysis: a novel algorithm for analyzing postprocessed LC-MS metabolomics data possessing high experimental noise. , 2015, Analytical chemistry.

[62]  H. Senn,et al.  Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. , 2006, Analytical chemistry.

[63]  Zhenfeng Duan,et al.  MicroRNA-199a-3p Is Downregulated in Human Osteosarcoma and Regulates Cell Proliferation and Migration , 2011, Molecular Cancer Therapeutics.

[64]  B. Warrack,et al.  Normalization strategies for metabonomic analysis of urine samples. , 2009, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[65]  Michał Jan Markuszewski,et al.  Liquid chromatography tandem mass spectrometry study of urinary nucleosides as potential cancer markers. , 2013, Journal of chromatography. A.

[66]  Feng Xu,et al.  Therapeutic target database update 2016: enriched resource for bench to clinical drug target and targeted pathway information , 2015, Nucleic Acids Res..

[67]  Jianguo Xia,et al.  Web-based inference of biological patterns, functions and pathways from metabolomic data using MetaboAnalyst , 2011, Nature Protocols.

[68]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[69]  Daniel Raftery,et al.  Interdependence of signal processing and analysis of urine 1H NMR spectra for metabolic profiling. , 2009, Analytical chemistry.

[70]  David Edwards,et al.  Non-linear Normalization and Background Correction in One-channel CDNA Microarray Studies , 2003, Bioinform..

[71]  A. Fukushima DiffCorr: an R package to analyze and visualize differential correlations in biological networks. , 2013, Gene.

[72]  Conrad Bessant,et al.  Evaluation of a gas sensor array and pattern recognition for the identification of bladder cancer from urine headspace. , 2011, The Analyst.

[73]  Russell D. Wolfinger,et al.  Comparison of Li-Wong and loglinear mixed models for the statistical analysis of oligonucleotide arrays , 2004, Bioinform..

[74]  Feng Zhu,et al.  Identification of the inhibitory mechanism of FDA approved selective serotonin reuptake inhibitors: an insight from molecular dynamics simulation study. , 2016, Physical chemistry chemical physics : PCCP.

[75]  Yi Zhang,et al.  Combination of injection volume calibration by creatinine and MS signals' normalization to overcome urine variability in LC-MS-based metabolomics studies. , 2013, Analytical chemistry.

[76]  Christoph Steinbeck,et al.  MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data , 2012, Nucleic Acids Res..

[77]  Raghuraj Rao,et al.  MetDAT: a modular and workflow-based free online pipeline for mass spectrometry data processing, analysis and interpretation , 2010, Bioinform..

[78]  Hunter N.B. Moseley,et al.  Error Analysis and Propagation in Metabolomics Data Analysis , 2013, Computational and structural biotechnology journal.

[79]  Kyoungmi Kim,et al.  Metabolomics in the study of kidney diseases , 2012, Nature Reviews Nephrology.

[80]  D. Wishart,et al.  Translational biomarker discovery in clinical metabolomics: an introductory tutorial , 2012, Metabolomics.

[81]  Lin Tao,et al.  Clustered patterns of species origins of nature-derived drugs and clues for future bioprospecting , 2011, Proceedings of the National Academy of Sciences.

[82]  Zhixiang Yan,et al.  Tailored sensitivity reduction improves pattern recognition and information recovery with a higher tolerance to varied sample concentration for targeted urinary metabolomics. , 2016, Journal of chromatography. A.

[83]  W. Huber,et al.  Model-based variance-stabilizing transformation for Illumina microarray data , 2008, Nucleic acids research.

[84]  Bruno Le Bizec,et al.  Evaluation of specific gravity as normalization strategy for cattle urinary metabolome analysis , 2013, Metabolomics.

[85]  Yue Joseph Wang,et al.  Bayesian Normalization Model for Label-Free Quantitative Analysis by LC-MS , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[86]  Gad Getz,et al.  Somatic ERCC2 Mutations Are Associated with a Distinct Genomic Signature in Urothelial Tumors , 2016, Nature Genetics.

[87]  David S. Wishart,et al.  MetaboAnalyst: a web server for metabolomic data analysis and interpretation , 2009, Nucleic Acids Res..

[88]  Joachim Thiery,et al.  Serum amino acid profiles and their alterations in colorectal cancer , 2011, Metabolomics.

[89]  N. Karp,et al.  Addressing Accuracy and Precision Issues in iTRAQ Quantitation* , 2010, Molecular & Cellular Proteomics.

[90]  Yue Joseph Wang,et al.  Normalization of LC-MS data using Gaussian process , 2012, Proceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS).

[91]  Wei Wu,et al.  Evaluation of normalization methods for cDNA microarray data by k-NN classification , 2005, BMC Bioinformatics.

[92]  Chunxiu Hu,et al.  Mass-spectrometry-based metabolomics analysis for foodomics , 2013 .

[93]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[94]  Patrizia Boracchi,et al.  Joint modelling of cause-specific hazard functions with cubic splines: an application to a large series of breast cancer patients , 2003, Comput. Stat. Data Anal..

[95]  Rainer Spang,et al.  Data Normalization of (1)H NMR Metabolite Fingerprinting Data Sets in the Presence of Unbalanced Metabolite Regulation. , 2015, Journal of proteome research.

[96]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[97]  Yue Joseph Wang,et al.  SIMAT: GC-SIM-MS data analysis tool , 2015, BMC Bioinformatics.

[98]  Fredrik Levander,et al.  Normalyzer: A Tool for Rapid Evaluation of Normalization Methods for Omics Data Sets , 2014, Journal of proteome research.