Mind the Gap: Mapping Mass Spectral Databases in Genome-Scale Metabolic Networks Reveals Poorly Covered Areas

The use of mass spectrometry-based metabolomics to study human, plant and microbial biochemistry and their interactions with the environment largely depends on the ability to annotate metabolite structures by matching mass spectral features of the measured metabolites to curated spectra of reference standards. While reference databases for metabolomics now provide information for hundreds of thousands of compounds, barely 5% of these known small molecules have experimental data from pure standards. Remarkably, it is still unknown how well existing mass spectral libraries cover the biochemical landscape of prokaryotic and eukaryotic organisms. To address this issue, we have investigated the coverage of 38 genome-scale metabolic networks by public and commercial mass spectral databases, and found that on average only 40% of nodes in metabolic networks could be mapped by mass spectral information from standards. Next, we deciphered computationally which parts of the human metabolic network are poorly covered by mass spectral libraries, revealing gaps in the eicosanoids, vitamins and bile acid metabolism. Finally, our network topology analysis based on the betweenness centrality of metabolites revealed the top 20 most important metabolites that, if added to MS databases, may facilitate human metabolome characterization in the future.

[1]  Ying Zhang,et al.  HMDB: the Human Metabolome Database , 2007, Nucleic Acids Res..

[2]  Emma L. Schymanski,et al.  Nontarget Screening with High Resolution Mass Spectrometry in the Environment: Ready to Go? , 2017, Environmental science & technology.

[3]  David S. Wishart,et al.  HMDB 4.0: the human metabolome database for 2018 , 2017, Nucleic Acids Res..

[4]  Eoin L. Brodie,et al.  Exometabolite niche partitioning among sympatric soil bacteria , 2015, Nature Communications.

[5]  G. Siuzdak,et al.  Innovation: Metabolomics: the apogee of the omics trilogy , 2012, Nature Reviews Molecular Cell Biology.

[6]  R. Abagyan,et al.  METLIN: A Metabolite Mass Spectral Database , 2005, Therapeutic drug monitoring.

[7]  Masanori Arita,et al.  Identification of small molecules using accurate mass MS/MS search. , 2018, Mass spectrometry reviews.

[8]  Fabien Jourdan,et al.  Computational methods to identify metabolic sub‐networks based on metabolomic profiles , 2017, Briefings Bioinform..

[9]  Rick L. Stevens,et al.  High-throughput generation, optimization and analysis of genome-scale metabolic models , 2010, Nature Biotechnology.

[10]  Juan Carlos Izpisua Belmonte,et al.  The metabolome of induced pluripotent stem cells reveals metabolic changes occurring in somatic cell reprogramming , 2011, Cell Research.

[11]  Egon L. Willighagen,et al.  The Chemical Translation Service—a web-based tool to improve standardization of metabolomic reports , 2010, Bioinform..

[12]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[13]  Emma L. Schymanski,et al.  Mass spectral databases for LC/MS- and GC/MS-based metabolomics: state of the field and future prospects , 2016 .

[14]  Roger Guimerà,et al.  iMet: A Network-Based Computational Tool To Assist in the Annotation of Metabolites from Tandem Mass Spectra. , 2016, Analytical chemistry.

[15]  Evan Bolton,et al.  PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem , 2015, Nucleic Acids Res..

[16]  David J. Beebe,et al.  Microbial metabolomics in open microscale platforms , 2016, Nature Communications.

[17]  David S. Wishart,et al.  CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra , 2014, Nucleic Acids Res..

[18]  Adam M. Feist,et al.  Basic and applied uses of genome-scale metabolic network reconstructions of Escherichia coli , 2013, Molecular systems biology.

[19]  Yutaka Yamada,et al.  RIKEN tandem mass spectral database (ReSpect) for phytochemicals: a plant-specific MS/MS-based data resource and database. , 2012, Phytochemistry.

[20]  Gary Siuzdak,et al.  Liquid chromatography quadrupole time-of-flight mass spectrometry characterization of metabolites guided by the METLIN database , 2013, Nature Protocols.

[21]  Antony J. Williams,et al.  ChemSpider:: An Online Chemical Information Resource , 2010 .

[22]  Daniel C. Zielinski,et al.  Recon3D enables a three-dimensional view of gene variation in human metabolism , 2018 .

[23]  S. Böcker,et al.  Searching molecular structure databases with tandem mass spectra using CSI:FingerID , 2015, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Romà Tauler,et al.  Metabolomic analysis of the effects of cadmium and copper treatment in Oryza sativa L. using untargeted liquid chromatography coupled to high resolution mass spectrometry and all-ion fragmentation. , 2017, Metallomics : integrated biometal science.

[25]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[26]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[27]  Ronan M. T. Fleming,et al.  Comparative evaluation of open source software for mapping between metabolite identifiers in metabolic network reconstructions: application to Recon 2 , 2014, Journal of Cheminformatics.

[28]  Camila Caldana,et al.  Mass spectrometry-based plant metabolomics: Metabolite responses to abiotic stress. , 2016, Mass spectrometry reviews.

[29]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Emma L. Schymanski,et al.  Open Science for Identifying "Known Unknown" Chemicals. , 2017, Environmental science & technology.

[31]  Ronan M. T. Fleming,et al.  A community-driven global reconstruction of human metabolism , 2013, Nature Biotechnology.

[32]  Xavier Salvatella,et al.  FoxA and LIPG endothelial lipase control the uptake of extracellular lipids for breast cancer growth , 2016, Nature Communications.

[33]  Nuno Bandeira,et al.  Three-Dimensional Microbiome and Metabolome Cartography of a Diseased Human Lung. , 2017, Cell host & microbe.

[34]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.

[35]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..