Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data

BackgroundBioinformatic tools for the enrichment of ‘omics’ datasets facilitate interpretation and understanding of data. To date few are suitable for metabolomics datasets. The main objective of this work is to give a critical overview, for the first time, of the performance of these tools. To that aim, datasets from metabolomic repositories were selected and enriched data were created. Both types of data were analysed with these tools and outputs were thoroughly examined.ResultsAn exploratory multivariate analysis of the most used tools for the enrichment of metabolite sets, based on a non-metric multidimensional scaling (NMDS) of Jaccard’s distances, was performed and mirrored their diversity. Codes (identifiers) of the metabolites of the datasets were searched in different metabolite databases (HMDB, KEGG, PubChem, ChEBI, BioCyc/HumanCyc, LipidMAPS, ChemSpider, METLIN and Recon2). The databases that presented more identifiers of the metabolites of the dataset were PubChem, followed by METLIN and ChEBI. However, these databases had duplicated entries and might present false positives. The performance of over-representation analysis (ORA) tools, including BioCyc/HumanCyc, ConsensusPathDB, IMPaLA, MBRole, MetaboAnalyst, Metabox, MetExplore, MPEA, PathVisio and Reactome and the mapping tool KEGGREST, was examined. Results were mostly consistent among tools and between real and enriched data despite the variability of the tools. Nevertheless, a few controversial results such as differences in the total number of metabolites were also found. Disease-based enrichment analyses were also assessed, but they were not found to be accurate probably due to the fact that metabolite disease sets are not up-to-date and the difficulty of predicting diseases from a list of metabolites.ConclusionsWe have extensively reviewed the state-of-the-art of the available range of tools for metabolomic datasets, the completeness of metabolite databases, the performance of ORA methods and disease-based analyses. Despite the variability of the tools, they provided consistent results independent of their analytic approach. However, more work on the completeness of metabolite and pathway databases is required, which strongly affects the accuracy of enrichment analyses. Improvements will be translated into more accurate and global insights of the metabolome.

[1]  Matej Oresic,et al.  MPEA - metabolite pathway enrichment analysis , 2011, Bioinform..

[2]  Akira Oikawa,et al.  Assessment of Metabolome Annotation Quality: A Method for Evaluating the False Discovery Rate of Elemental Composition Searches , 2009, PloS one.

[3]  Egon L. Willighagen,et al.  The Chemical Translation Service—a web-based tool to improve standardization of metabolomic reports , 2010, Bioinform..

[4]  B. Misra,et al.  Updates in metabolomics tools and resources: 2014–2015 , 2016, Electrophoresis.

[5]  Karsten Suhre,et al.  MassTRIX: mass translator into pathways , 2008, Nucleic Acids Res..

[6]  Christoph Steinbeck,et al.  Global open data management in metabolomics , 2017, Current opinion in chemical biology.

[7]  Eoin Fahy,et al.  Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools , 2015, Nucleic Acids Res..

[8]  Kwanjeera Wanichthanarak,et al.  Metabox: A Toolbox for Metabolomic Data Analysis, Interpretation and Integrative Exploration , 2017, PloS one.

[9]  Christoph Steinbeck,et al.  MetaboLights: An Open‐Access Database Repository for Metabolomics Data , 2016, Current protocols in bioinformatics.

[10]  Lincoln D. Stein,et al.  Impact of outdated gene annotations on pathway enrichment analysis , 2016, Nature Methods.

[11]  Chris T. A. Evelo,et al.  Presenting and exploring biological pathways with PathVisio , 2008, BMC Bioinformatics.

[12]  Antony J. Williams,et al.  ChemSpider:: An Online Chemical Information Resource , 2010 .

[13]  David S. Wishart,et al.  HMDB 3.0—The Human Metabolome Database in 2013 , 2012, Nucleic Acids Res..

[14]  Gary D Bader,et al.  Pathway and network analysis of cancer genomes , 2015, Nature Methods.

[15]  Shankar Subramaniam,et al.  An editor for pathway drawing and data visualization in the Biopathways Workbench , 2009, BMC Systems Biology.

[16]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[17]  Norman,et al.  Structural Models: An Introduction to the Theory of Directed Graphs. , 1966 .

[18]  B. A. Farbey,et al.  Structural Models: An Introduction to the Theory of Directed Graphs , 1966 .

[19]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[20]  Jing Gao,et al.  Metscape: a Cytoscape plug-in for visualizing and interpreting metabolomic data in the context of human metabolic networks , 2010, Bioinform..

[21]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[22]  Michael P. Barrett,et al.  MetExplore: a web server to link metabolomic experiments and genome-scale metabolic networks , 2010, Nucleic Acids Res..

[23]  Matthew A. Hibbs,et al.  Visualization of omics data for systems biology , 2010, Nature Methods.

[24]  Lincoln Stein,et al.  Reactome pathway analysis to enrich biological discovery in proteomics data sets , 2011, Proteomics.

[25]  Daniel Raftery,et al.  Colorectal cancer detection using targeted serum metabolic profiling. , 2014, Journal of proteome research.

[26]  Chris T. A. Evelo,et al.  The BridgeDb framework: standardized access to gene, protein and metabolite identifier mapping services , 2010, BMC Bioinformatics.

[27]  Zhi-hua Chen,et al.  Kyoto Encyclopedia of Genes and Genomes were used for functional enrichment analysis of differentially expressed genes (DEGs). A protein‐protein interaction network was constructed, and the hub genes were subjected to module analysis and identification using Search Tool for the Retrieval , 2019 .

[28]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[29]  R. Abagyan,et al.  METLIN: A Metabolite Mass Spectral Database , 2005, Therapeutic drug monitoring.

[30]  Alexander R. Pico,et al.  WikiPathways: Pathway Editing for the People , 2008, PLoS biology.

[31]  Joaquín Dopazo,et al.  Paintomics: a web based tool for the joint visualization of transcriptomics and metabolomics data , 2010, Bioinform..

[32]  Silas Granato Villas-Bôas,et al.  Pathway Activity Profiling (PAPi): from the metabolite profile to the metabolic pathway activity , 2010, Bioinform..

[33]  David S. Wishart,et al.  SMPDB: The Small Molecule Pathway Database , 2009, Nucleic Acids Res..

[34]  Masanori Arita,et al.  Consolidating metabolite identifiers to enable contextual and multi-platform metabolomics data analysis , 2010, BMC Bioinformatics.

[35]  Clement Adebamowo,et al.  Serum Metabolomic Profiles Identify ER-Positive Early Breast Cancer Patients at Increased Risk of Disease Recurrence in a Multicenter Population , 2017, Clinical Cancer Research.

[36]  P. Karp,et al.  Computational prediction of human metabolic pathways from the complete human genome , 2004, Genome Biology.

[37]  Oliver Fiehn,et al.  Plasma Metabolomic Profiles Reflective of Glucose Homeostasis in Non-Diabetic and Type 2 Diabetic Obese African-American Women , 2010, PloS one.

[38]  Oliver Fiehn,et al.  MetaMapp: mapping and visualizing metabolomic data by integrating information from biochemical pathways and chemical and mass spectral similarity , 2012, BMC Bioinformatics.

[39]  Daniel Raftery,et al.  Colorectal Cancer Detection Using Targeted LC-MS Metabolic Profiling. , 2014, Methods in molecular biology.

[40]  Monica Chagoyen,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[41]  Anna Lindahl,et al.  Overlap in serum metabolic profiles between non-related diseases: Implications for LC-MS metabolomics biomarker discovery. , 2016, Biochemical and biophysical research communications.

[42]  Alla Karnovsky,et al.  Metabolomics and Diabetes: Analytical and Computational Approaches , 2015, Diabetes.

[43]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[44]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[45]  Jose M Villaveces,et al.  Tools for visualization and analysis of molecular networks, pathways, and -omics data , 2015, Advances and applications in bioinformatics and chemistry : AABC.

[46]  Daniel Raftery,et al.  Quantitative Metabolomics by 1H-NMR and LC-MS/MS Confirms Altered Metabolic Pathways in Diabetes , 2010, PloS one.

[47]  David S. Wishart,et al.  MSEA: a web-based tool to identify biologically meaningful patterns in quantitative metabolomic data , 2010, Nucleic Acids Res..

[48]  Timothy M. D. Ebbels,et al.  Integrated pathway-level analysis of transcriptomics and metabolomics data with IMPaLA , 2011 .

[49]  Fumio Matsuda,et al.  Rethinking Mass Spectrometry-Based Small Molecule Identification Strategies in Metabolomics. , 2014, Mass spectrometry.

[50]  John C Lindon,et al.  Multiplatform serum metabolic phenotyping combined with pathway mapping to identify biochemical differences in smokers. , 2016, Bioanalysis.

[51]  Ralf Herwig,et al.  ConsensusPathDB—a database for integrating human functional interaction networks , 2008, Nucleic Acids Res..

[52]  Jianguo Xia,et al.  Web-based inference of biological patterns, functions and pathways from metabolomic data using MetaboAnalyst , 2011, Nature Protocols.

[53]  Yufeng J. Tseng,et al.  3Omics: a web-based systems biology tool for analysis, integration and visualization of human transcriptomic, proteomic and metabolomic data , 2013, BMC Systems Biology.

[54]  David W. Russell,et al.  LMSD: LIPID MAPS structure database , 2006, Nucleic Acids Res..

[55]  Ronan M. T. Fleming,et al.  A community-driven global reconstruction of human metabolism , 2013, Nature Biotechnology.

[56]  Aalim M. Weljie,et al.  Computational Tools for the Secondary Analysis of Metabolomics Experiments , 2013, Computational and structural biotechnology journal.

[57]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..