Precise generation of systems biology models from KEGG pathways

BackgroundThe KEGG PATHWAY database provides a plethora of pathways for a diversity of organisms. All pathway components are directly linked to other KEGG databases, such as KEGG COMPOUND or KEGG REACTION. Therefore, the pathways can be extended with an enormous amount of information and provide a foundation for initial structural modeling approaches. As a drawback, KGML-formatted KEGG pathways are primarily designed for visualization purposes and often omit important details for the sake of a clear arrangement of its entries. Thus, a direct conversion into systems biology models would produce incomplete and erroneous models.ResultsHere, we present a precise method for processing and converting KEGG pathways into initial metabolic and signaling models encoded in the standardized community pathway formats SBML (Levels 2 and 3) and BioPAX (Levels 2 and 3). This method involves correcting invalid or incomplete KGML content, creating complete and valid stoichiometric reactions, translating relations to signaling models and augmenting the pathway content with various information, such as cross-references to Entrez Gene, OMIM, UniProt ChEBI, and many more.Finally, we compare several existing conversion tools for KEGG pathways and show that the conversion from KEGG to BioPAX does not involve a loss of information, whilst lossless translations to SBML can only be performed using SBML Level 3, including its recently proposed qualitative models and groups extension packages.ConclusionsBuilding correct BioPAX and SBML signaling models from the KEGG database is a unique characteristic of the proposed method. Further, there is no other approach that is able to appropriately construct metabolic models from KEGG pathways, including correct reactions with stoichiometry. The resulting initial models, which contain valid and comprehensive SBML or BioPAX code and a multitude of cross-references, lay the foundation to facilitate further modeling steps.

[1]  Andreas Zell,et al.  JSBML: a flexible Java library for working with SBML , 2011, Bioinform..

[2]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[3]  Lei Shi,et al.  SABIO-RK—database for biochemical reaction kinetics , 2011, Nucleic Acids Res..

[4]  Andreas Zell,et al.  SBMLsqueezer: A CellDesigner plug-in to generate kinetic rate equations for biochemical networks , 2008, BMC Systems Biology.

[5]  Andreas Zell,et al.  KEGGtranslator: visualizing and converting the KEGG PATHWAY database to various formats , 2011, Bioinform..

[6]  Luisa Montecchi-Palazzi,et al.  The PSI-MOD community standard for representation of protein modification data , 2008, Nature Biotechnology.

[7]  Falk Schreiber,et al.  Integration of -omics data and networks for biomedical research with VANTED , 2010, J. Integr. Bioinform..

[8]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[9]  Chris T. A. Evelo,et al.  Presenting and exploring biological pathways with PathVisio , 2008, BMC Bioinformatics.

[10]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[11]  Jason A. Papin,et al.  Applications of genome-scale metabolic reconstructions , 2009, Molecular systems biology.

[12]  Andreas Hoppe,et al.  FASIMU: flexible software for flux-balance computation series in large metabolic networks , 2011, BMC Bioinformatics.

[13]  Michael Hucka,et al.  The Systems Biology Markup Language (SBML): Language Specification for Level 3 Version 1 Core , 2010 .

[14]  Michel Dumontier,et al.  Controlled vocabularies and semantics in systems biology , 2011, Molecular systems biology.

[15]  N. Kikuchi,et al.  CellDesigner 3.5: A Versatile Modeling Tool for Biochemical Networks , 2008, Proceedings of the IEEE.

[16]  A. Bauer-Mehren,et al.  Pathway databases and tools for their exploitation: benefits, current limitations and challenges , 2009, Molecular systems biology.

[17]  A Finney,et al.  Systems biology markup language: Level 2 and beyond. , 2003, Biochemical Society transactions.

[18]  Emmanuel Barillot,et al.  BiNoM: a Cytoscape plugin for manipulating and analyzing biological networks , 2008, Bioinform..

[19]  Hugh D. Spence,et al.  Minimum information requested in the annotation of biochemical models (MIRIAM) , 2005, Nature Biotechnology.

[20]  Michael Hucka,et al.  Systems Biology Markup Language (SBML) Level 2: Structures and Facilities for Model Definitions , 2007, WWW 2007.

[21]  Norman W. Paton,et al.  The SuBliMinaL Toolbox: automating steps in the reconstruction of metabolic networks , 2011, J. Integr. Bioinform..

[22]  Ilias Maglogiannis,et al.  KEGGconverter: a tool for the in-silico modelling of metabolic networks of the KEGG Pathways database , 2009, BMC Bioinformatics.

[23]  Nicolas Le Novère,et al.  Identifiers.org and MIRIAM Registry: community resources to provide persistent identification , 2011, Nucleic Acids Res..

[24]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[25]  Christina Backes,et al.  BNDB – The Biochemical Network Database , 2007, BMC Bioinformatics.

[26]  Falk Schreiber,et al.  Integration of -omics data and networks for biomedical research with VANTED , 2010, J. Integr. Bioinform..

[27]  Gary D Bader,et al.  BioPAX – A community standard for pathway data sharing , 2010, Nature Biotechnology.

[28]  Arang Rhie,et al.  Java DOM Parsers to Convert KGML into SBML and BioPAX Common Exchange Formats , 2010 .

[29]  C. Sander,et al.  The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data , 2004, Nature Biotechnology.