Visualization of statistically processed LC-MS-based metabolomics data for identifying significant features in a multiple-group comparison

Abstract Analyzing and presenting data from multiple groups are much more informative than that from two groups. However, common tools such as S plot and volcano plot are only available for identifying the significant features between two groups and are restricted to multiple-group comparisons. This study proposed novel visualization plots which not only overcame the restrictions of the above methods but also utilized the p values of multiple tests as the x-axis. The novel visualization plots included a parametric method and a nonparametric method. The parametric method was a combination of an analysis of variance and Welch’s analysis of variance; the nonparametric method used the Kruskal-Wallis test. During the selection of significant features, machine learning algorithms were used to determine the cutting points of the x-axis. As a proof of concept, the real data from the experiments of 4-MeO-α-PVP metabolites and fish spoilage metabolomics were illustrated via our visualization method. The results showed that the novel visualization plots were much efficiently presented to identify significant metabolites in multiple-group comparisons. Especially, the positive predicted values of the nonparametric method and the cutting points determined by logistic regression were higher than those of other machine learning algorithms in determining the cutting points for multiple groups.

[1]  Yizeng Liang,et al.  Exploring metabolic syndrome serum profiling based on gas chromatography mass spectrometry and random forest models. , 2014, Analytica chimica acta.

[2]  Mahlet G Tadesse,et al.  Utilization of metabolomics to identify serum biomarkers for hepatocellular carcinoma in patients with liver cirrhosis. , 2012, Analytica chimica acta.

[3]  L. Dragsted,et al.  Metabolic fingerprinting of high-fat plasma samples processed by centrifugation- and filtration-based protein precipitation delineates significant differences in metabolite information coverage. , 2012, Analytica chimica acta.

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  Jordi Coello,et al.  Determination of polymorphic purity by near infrared spectrometry , 2000 .

[6]  William Chih-Wei Chang,et al.  Untargeted foodomics strategy using high-resolution mass spectrometry reveals potential indicators for fish freshness. , 2020, Analytica chimica acta.

[7]  Olav M. Kvalheim,et al.  Interpretation of latent-variable regression models , 1989 .

[8]  Yukio Tominaga,et al.  Comparative study of class data analysis with PCA-LDA, SIMCA, PLS, ANNs, and k-NN , 1999 .

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[11]  Olga V. Demler,et al.  Statistical Workflow for Feature Selection in Human Metabolomics Data , 2019, Metabolites.

[12]  B. Hammock,et al.  Mass spectrometry-based metabolomics. , 2007, Mass spectrometry reviews.

[13]  Julien Boccard,et al.  A consensus orthogonal partial least squares discriminant analysis (OPLS-DA) strategy for multiblock Omics data fusion. , 2013, Analytica chimica acta.

[14]  J. Idle,et al.  Xenobiotic metabolism: a view through the metabolometer. , 2010, Chemical research in toxicology.

[15]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[16]  Ruijun Wu,et al.  A novel strategy for rapidly and accurately screening biomarkers based on ultraperformance liquid chromatography-mass spectrometry metabolomics data. , 2019, Analytica chimica acta.

[17]  John A McLean,et al.  Comparative mass spectrometry-based metabolomics strategies for the investigation of microbial secondary metabolites. , 2017, Natural product reports.

[18]  Yoshihiro Maruyama,et al.  Feature visualization of Raman spectrum analysis with deep convolutional neural network , 2019, Analytica chimica acta.

[19]  Kai-Ping Chang,et al.  Integrated analyses utilizing metabolomics and transcriptomics reveal perturbation of the polyamine pathway in oral cavity squamous cell carcinoma. , 2019, Analytica chimica acta.

[20]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[21]  Sascha K. Manier,et al.  Current Situation of the Metabolomics Techniques Used for the Metabolism Studies of New Psychoactive Substances. , 2020, Therapeutic drug monitoring.

[22]  J. Ivanisevic,et al.  A global HILIC-MS approach to measure polar human cerebrospinal fluid metabolome: Exploring gender-associated variation in a cohort of elderly cognitively healthy subjects. , 2018, Analytica chimica acta.

[23]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[24]  D. Gauguier,et al.  Statistical total correlation spectroscopy: an exploratory approach for latent biomarker identification from metabolic 1H NMR data sets. , 2005, Analytical chemistry.

[25]  Tao-Tao Liu,et al.  Metabolomic profiling of human urine in hepatocellular carcinoma patients using gas chromatography/mass spectrometry. , 2009, Analytica chimica acta.

[26]  Robert Powers,et al.  Multivariate Analysis in Metabolomics. , 2012, Current Metabolomics.

[27]  Yutaka Yamada,et al.  A cheminformatics approach to characterize metabolomes in stable-isotope-labeled organisms , 2019, Nature Methods.

[28]  R. Bruno,et al.  New psychoactive substances: challenges for drug surveillance, control, and public health responses , 2019, The Lancet.

[29]  Mark R Schultz,et al.  False discovery rate control is a recommended alternative to Bonferroni-type adjustments in health studies. , 2014, Journal of clinical epidemiology.

[30]  J. Hilbe Logistic Regression Models , 2009 .

[31]  Yong-Jin Yoon,et al.  Volatile chemical spoilage indexes of raw Atlantic salmon (Salmo salar) stored under aerobic condition in relation to microbiological and sensory shelf lives. , 2016, Food microbiology.

[32]  Qing-Song Xu,et al.  Support vector machines and its applications in chemistry , 2009 .

[33]  Robert Powers,et al.  PCA as a practical indicator of OPLS-DA model reliability. , 2016, Current Metabolomics.

[34]  M. Rantalainen,et al.  OPLS discriminant analysis: combining the strengths of PLS‐DA and SIMCA classification , 2006 .