Towards rule-based metabolic databases: a requirement analysis based on KEGG

Knowledge of metabolic processes is collected in easily accessable online databases which are increasing rapidly in content and detail. Using these databases for the automatic construction of metabolic network models requires high accuracy and consistency. In this bipartite study we evaluate current accuracy and consistency problems using the KEGG database as a prominent example and propose design principles for dealing with such problems. In the first half, we present our computational approach for classifying inconsistencies and provide an overview of the classes of inconsistencies we identified. We detected inconsistencies both for database entries referring to substances and entries referring to reactions. In the second part, we present strategies to deal with the detected problem classes. We especially propose a rule-based database approach which allows for the inclusion of parameterised molecular species and parameterised reactions. Detailed case-studies and a comparison of explicit networks from KEGG with their anticipated rule-based representation underline the applicability and scalability of this approach.

[1]  Frederick Soddy,et al.  Atomic transmutation : the greatest discovery ever made from memoirs of Frederick Soddy , 1953 .

[2]  Mathew J. Palakal,et al.  An on demand data integration model for biological databases , 2009, Int. J. Data Min. Bioinform..

[3]  Hao Wang,et al.  A rule-based approach for RNA pseudoknot prediction , 2008, Int. J. Data Min. Bioinform..

[4]  R. K. De,et al.  Comparing methods for metabolic network analysis and an application to metabolic engineering. , 2013, Gene.

[5]  Víctor de Lorenzo,et al.  Systems biology approaches to bioremediation. , 2008 .

[6]  Annegret Wagler,et al.  A mathematical approach to solve the network reconstruction problem , 2008, Math. Methods Oper. Res..

[7]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[8]  P. Karp,et al.  Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers , 2005, Nucleic acids research.

[9]  Oliver Ebenhöh,et al.  Expanding Metabolic Networks: Scopes of Compounds, Robustness, and Evolution , 2005, Journal of Molecular Evolution.

[10]  Matej Oresic,et al.  An integrative approach for biological data mining and visualisation , 2008, Int. J. Data Min. Bioinform..

[11]  Gert Vriend,et al.  Correcting ligands, metabolites, and pathways , 2006, BMC Bioinformatics.

[12]  Christoph Steinbeck,et al.  Rhea—a manually curated resource of biochemical reactions , 2011, Nucleic Acids Res..

[13]  Jeffrey D Orth,et al.  What is flux balance analysis? , 2010, Nature Biotechnology.

[14]  B O Palsson,et al.  Metabolic modeling of microbial strains in silico. , 2001, Trends in biochemical sciences.

[15]  Kazi Zakia Sultana,et al.  Querying KEGG pathways in logic , 2014, Int. J. Data Min. Bioinform..

[16]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[17]  D. Kell,et al.  The Kyoto Encyclopedia of Genes and Genomes—KEGG , 2000, Yeast.

[18]  Peter D. Karp,et al.  A survey of metabolic databases emphasizing the MetaCyc family , 2011, Archives of Toxicology.

[19]  G. Valiente,et al.  Validation of metabolic pathway databases based on chemical substructure search. , 2007, Biomolecular engineering.

[20]  Costas D. Maranas,et al.  MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases , 2012, BMC Bioinformatics.

[21]  Elhanan Borenstein,et al.  Topological Signatures of Species Interactions in Metabolic Networks , 2009, J. Comput. Biol..

[22]  Stephen R. Heller,et al.  InChI - the worldwide chemical structure identifier standard , 2013, Journal of Cheminformatics.

[23]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[24]  Leen Stougie,et al.  Enumerating Precursor Sets of Target Metabolites in a Metabolic Network , 2008, WABI.

[25]  Leen Stougie,et al.  Graph-Based Analysis of the Metabolic Exchanges between Two Co-Resident Intracellular Symbionts, Baumannia cicadellinicola and Sulcia muelleri, with Their Insect Host, Homalodisca coagulata , 2010, PLoS Comput. Biol..

[26]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[27]  J. Rohwer Kinetic modelling of plant metabolic pathways. , 2012, Journal of experimental botany.

[28]  M. Feldman,et al.  Large-scale reconstruction and phylogenetic analysis of metabolic environments , 2008, Proceedings of the National Academy of Sciences.