Consistency, Inconsistency and Ambiguity of Metabolite Names in Biochemical Databases Used for Genome Scale Metabolic Modelling

Genome scale metabolic models (GEMs) are manually curated repositories describing the metabolic capabilities of an organism. GEMs have been successfully used in different research areas, ranging from systems medicine to biotechnology. However, the different naming conventions (namespaces) of databases used to build GEMs limit model reusability and prevent the integration of existing models. This problem is known in the GEM community but its extent has not been analyzed in depth. In this study, we investigate the name ambiguity and the multiplicity of non-systematic identifiers and we highlight the (in)consistency in their use in eleven biochemical databases of biochemical reactions and the problems that arise when mapping between different namespaces and databases. We found that such inconsistencies can be as high as 83.1%, thus emphasizing the need for strategies to deal with these issues. Currently, manual verification of the mappings appears to be the only solution to remove inconsistencies when combining models. Finally, we discuss several possible approaches to facilitate (future) unambiguous mapping.

[1]  Intawat Nookaew,et al.  The RAVEN Toolbox and Its Use for Generating a Genome-scale Metabolic Model for Penicillium chrysogenum , 2013, PLoS Comput. Biol..

[2]  Costas D. Maranas,et al.  MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases , 2012, BMC Bioinformatics.

[3]  Sebastián N. Mendoza,et al.  Mapping the Physiological Response of Oenococcus oeni to Ethanol Stress Using an Extended Genome-Scale Metabolic Model , 2018, Front. Microbiol..

[4]  Martins Mednis,et al.  Automatic comparison of metabolites names: impact of criteria thresholds , 2013 .

[5]  Sean Ekins,et al.  Towards a gold standard: regarding quality in public domain chemistry databases and approaches to improving the situation. , 2012, Drug discovery today.

[6]  Chris J. Myers,et al.  Harmonizing semantic annotations for computational models in biology , 2018, bioRxiv.

[7]  D. Machado,et al.  Fast automated reconstruction of genome-scale metabolic models for microbial species and communities , 2018, bioRxiv.

[8]  Ruben G. A. van Heck,et al.  More than just a gut feeling: constraint-based genome-scale metabolic models for predicting functions of human intestinal microbes , 2017, Microbiome.

[9]  Olivier Martin,et al.  MetaNetX/MNXref – reconciliation of metabolites and biochemical reactions to bring together genome-scale metabolic networks , 2015, Nucleic Acids Res..

[10]  Stephen R. Heller,et al.  InChI, the IUPAC International Chemical Identifier , 2015, Journal of Cheminformatics.

[11]  Mario Latendresse,et al.  Efficiently gap-filling reaction networks , 2014, BMC Bioinformatics.

[12]  Emanuel Schmid,et al.  enviPath – The environmental contaminant biotransformation pathway resource , 2015, Nucleic Acids Res..

[13]  Jörg Stelling,et al.  Efficient Reconstruction of Predictive Consensus Metabolic Network Models , 2016, PLoS Comput. Biol..

[14]  Peter D. Karp,et al.  The Pathway Tools software , 2002, ISMB.

[15]  B. Palsson,et al.  A protocol for generating a high-quality genome-scale metabolic reconstruction , 2010 .

[16]  Z. M. Ozsoyoglu,et al.  Matching metabolites and reactions in different metabolic networks. , 2014, Methods.

[17]  Lei Shi,et al.  SABIO-RK—database for biochemical reaction kinetics , 2011, Nucleic Acids Res..

[18]  R. Overbeek,et al.  Automated genome annotation and metabolic model reconstruction in the SEED and Model SEED. , 2013, Methods in molecular biology.

[19]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[20]  Steffen Klamt,et al.  Memote: A community driven effort towards a standardized genome-scale metabolic model test suite , 2018, bioRxiv.

[21]  Jason A. Papin,et al.  Applications of genome-scale metabolic reconstructions , 2009, Molecular systems biology.

[22]  Markus J. Herrgård,et al.  A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology , 2008, Nature Biotechnology.

[23]  Karthik Raman,et al.  Critical assessment of genome-scale metabolic networks: the need for a unified standard , 2015, Briefings Bioinform..

[24]  Cheng Zhang,et al.  Applications of Genome-Scale Metabolic Models in Biotechnology and Systems Medicine , 2016, Front. Physiol..

[25]  Lincoln Stein,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Res..

[26]  Bas Teusink,et al.  Constraint-based stoichiometric modelling from single organisms to microbial communities , 2016, Journal of The Royal Society Interface.

[27]  D. Young,et al.  Are the Chemical Structures in Your QSAR Correct , 2008 .

[28]  Minoru Kanehisa,et al.  The KEGG database. , 2002, Novartis Foundation symposium.

[29]  Steinn Gudmundsson,et al.  Applications of genome-scale metabolic models of microalgae and cyanobacteria in biotechnology , 2017 .

[30]  Robert A. Edwards,et al.  From DNA to FBA: How to Build Your Own Genome-Scale Metabolic Model , 2016, Frontiers in microbiology.

[31]  David W. Russell,et al.  LMSD: LIPID MAPS structure database , 2006, Nucleic Acids Res..

[32]  P. May,et al.  An integrative approach towards completing genome-scale metabolic networks. , 2009, Molecular bioSystems.

[33]  Ronan M. T. Fleming,et al.  Generation of genome-scale metabolic reconstructions for 773 members of the human gut microbiota , 2016, Nature Biotechnology.

[34]  Philip Miller,et al.  BiGG Models: A platform for integrating, standardizing and sharing genome-scale models , 2015, Nucleic Acids Res..

[35]  Alan Bridge,et al.  The SwissLipids knowledgebase for lipid biology , 2015, Bioinform..

[36]  Ronan M. T. Fleming,et al.  Comparative evaluation of open source software for mapping between metabolite identifiers in metabolic network reconstructions: application to Recon 2 , 2014, Journal of Cheminformatics.

[37]  John Gould,et al.  Toward the automated generation of genome-scale metabolic networks in the SEED , 2007, BMC Bioinformatics.

[38]  Jan A. Kors,et al.  Consistency of systematic chemical identifiers within and between small-molecule databases , 2012, Journal of Cheminformatics.

[39]  Miguel Rocha,et al.  Methods for automated genome-scale metabolic model reconstruction. , 2018, Biochemical Society transactions.

[40]  Ying Zhang,et al.  HMDB: the Human Metabolome Database , 2007, Nucleic Acids Res..

[41]  Abraham A. Labena,et al.  Metabolic pathway databases and model repositories , 2018, Quantitative Biology.

[42]  Thomas Bernard,et al.  Reconciliation of metabolites and biochemical reactions for metabolic networks , 2012, Briefings Bioinform..

[43]  Kiran Raosaheb Patil,et al.  Use of genome-scale microbial models for metabolic engineering. , 2004, Current opinion in biotechnology.

[44]  Masanori Arita,et al.  Consolidating metabolite identifiers to enable contextual and multi-platform metabolomics data analysis , 2010, BMC Bioinformatics.

[45]  Antony J. Williams,et al.  Ambiguity of non-systematic chemical identifiers within and between small-molecule databases , 2015, Journal of Cheminformatics.

[46]  Stephen R. Heller,et al.  InChI - the worldwide chemical structure identifier standard , 2013, Journal of Cheminformatics.

[47]  Peter D. Karp,et al.  The MetaCyc Database , 2002, Nucleic Acids Res..