A systematic comparison of the MetaCyc and KEGG pathway databases

BackgroundThe MetaCyc and KEGG projects have developed large metabolic pathway databases that are used for a variety of applications including genome analysis and metabolic engineering. We present a comparison of the compound, reaction, and pathway content of MetaCyc version 16.0 and a KEGG version downloaded on Feb-27-2012 to increase understanding of their relative sizes, their degree of overlap, and their scope. To assess their overlap, we must know the correspondences between compounds, reactions, and pathways in MetaCyc, and those in KEGG. We devoted significant effort to computational and manual matching of these entities, and we evaluated the accuracy of the correspondences.ResultsKEGG contains 179 module pathways versus 1,846 base pathways in MetaCyc; KEGG contains 237 map pathways versus 296 super pathways in MetaCyc. KEGG pathways contain 3.3 times as many reactions on average as do MetaCyc pathways, and the databases employ different conceptualizations of metabolic pathways. KEGG contains 8,692 reactions versus 10,262 for MetaCyc. 6,174 KEGG reactions are components of KEGG pathways versus 6,348 for MetaCyc. KEGG contains 16,586 compounds versus 11,991 for MetaCyc. 6,912 KEGG compounds act as substrates in KEGG reactions versus 8,891 for MetaCyc. MetaCyc contains a broader set of database attributes than does KEGG, such as relationships from a compound to enzymes that it regulates, identification of spontaneous reactions, and the expected taxonomic range of metabolic pathways. MetaCyc contains many pathways not found in KEGG, from plants, fungi, metazoa, and actinobacteria; KEGG contains pathways not found in MetaCyc, for xenobiotic degradation, glycan metabolism, and metabolism of terpenoids and polyketides. MetaCyc contains fewer unbalanced reactions, which facilitates metabolic modeling such as using flux-balance analysis. MetaCyc includes generic reactions that may be instantiated computationally.ConclusionsKEGG contains significantly more compounds than does MetaCyc, whereas MetaCyc contains significantly more reactions and pathways than does KEGG, in particular KEGG modules are quite incomplete. The number of reactions occurring in pathways in the two DBs are quite similar.

[1]  E. Webb Enzyme nomenclature 1992. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes. , 1992 .

[2]  Peter D. Karp,et al.  MetaCyc: a multiorganism database of metabolic pathways and enzymes , 2005, Nucleic Acids Res..

[3]  Gerbert A. Jansen,et al.  Critical assessment of human metabolic pathway databases: a stepping stone for future integration , 2011, BMC Systems Biology.

[4]  Peter D. Karp,et al.  Machine learning methods for metabolic pathway prediction , 2010 .

[5]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[6]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[7]  P. D. Karp,et al.  The outcomes of pathway database computations depend on pathway ontology , 2006, Nucleic acids research.

[8]  Yike Guo,et al.  Consistency, comprehensiveness, and compatibility of pathway databases , 2010, BMC Bioinformatics.

[9]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[10]  The UniProt Consortium,et al.  The Universal Protein Resource (UniProt) 2009 , 2008, Nucleic Acids Res..

[11]  Giorgio Valle,et al.  The Gene Ontology project in 2008 , 2007, Nucleic Acids Res..

[12]  Dietmar Schomburg,et al.  BKM-react, an integrated biochemical reaction database , 2011, BMC Biochemistry.

[13]  Peter D. Karp,et al.  Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology , 2015, Briefings Bioinform..

[14]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[15]  Costas D. Maranas,et al.  MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases , 2012, BMC Bioinformatics.

[16]  S. Heller,et al.  An Open Standard for Chemical Structure Representation: The IUPAC Chemical Identifier , 2003 .

[17]  Peter D. Karp,et al.  The MetaCyc Database , 2002, Nucleic Acids Res..

[18]  P. Karp Call for an enzyme genomics initiative , 2004, Genome Biology.

[19]  Harsha Karur Rajasimha,et al.  PathMeld: A Methodology for The Unification of Metabolic Pathway Databases , 2004 .

[20]  Peter D. Karp,et al.  A survey of metabolic databases emphasizing the MetaCyc family , 2011, Archives of Toxicology.

[21]  Peter D. Karp,et al.  A survey of orphan enzyme activities , 2007, BMC Bioinformatics.

[22]  Bonnie Berger,et al.  MetaMerge: scaling up genome-scale metabolic reconstructions with application to Mycobacterium tuberculosis , 2012, Genome Biology.

[23]  Peer Bork,et al.  KEGG Atlas mapping for global analysis of metabolic pathways , 2008, Nucleic Acids Res..

[24]  Bernard Henrissat,et al.  Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome , 2012, PLoS Comput. Biol..

[25]  Thomas Bernard,et al.  Reconciliation of metabolites and biochemical reactions for metabolic networks , 2012, Briefings Bioinform..

[26]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[27]  Peter D. Karp,et al.  The EcoCyc and MetaCyc databases , 2000, Nucleic Acids Res..

[28]  Scott Federhen,et al.  The NCBI Taxonomy database , 2011, Nucleic Acids Res..

[29]  Léon Personnaz,et al.  Enrichment or depletion of a GO category within a class of genes: which test? , 2007, Bioinform..

[30]  Susumu Goto,et al.  The KEGG databases at GenomeNet , 2002, Nucleic Acids Res..

[31]  Jeffrey D Orth,et al.  What is flux balance analysis? , 2010, Nature Biotechnology.

[32]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[33]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[34]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[35]  Rick L. Stevens,et al.  High-throughput generation, optimization and analysis of genome-scale metabolic models , 2010, Nature Biotechnology.

[36]  Michael Travers,et al.  BioBIKE: A Web-based, programmable, integrated biological knowledge base , 2009, Nucleic Acids Res..

[37]  Christoph Steinbeck,et al.  Rhea—a manually curated resource of biochemical reactions , 2011, Nucleic Acids Res..

[38]  Minoru Kanehisa,et al.  KEGG API: A Web Service Using SOAP/WSDL to Access the KEGG System , 2003 .

[39]  Peter D. Karp,et al.  The Pathway Tools Pathway Prediction Algorithm , 2011, Standards in genomic sciences.

[40]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .