The outcomes of pathway database computations depend on pathway ontology

Different biological notions of pathways are used in different pathway databases. Those pathway ontologies significantly impact pathway computations. Computational users of pathway databases will obtain different results depending on the pathway ontology used by the databases they employ, and different pathway ontologies are preferable for different end uses. We explore differences in pathway ontologies by comparing the BioCyc and KEGG ontologies. The BioCyc ontology defines a pathway as a conserved, atomic module of the metabolic network of a single organism, i.e. often regulated as a unit, whose boundaries are defined at high-connectivity stable metabolites. KEGG pathways are on average 4.2 times larger than BioCyc pathways, and combine multiple biological processes from different organisms to produce a substrate-centered reaction mosaic. We compared KEGG and BioCyc pathways using genome context methods, which determine the functional relatedness of pairs of genes. For each method we employed, a pair of genes randomly selected from a BioCyc pathway is more likely to be related by that method than is a pair of genes randomly selected from a KEGG pathway, supporting the conclusion that the BioCyc pathway conceptualization is closer to a single conserved biological process than is that of KEGG.

[1]  Sophia Tsoka,et al.  Prediction of protein interactions: metabolic enzymes are frequently involved in gene fusion , 2000, Nature Genetics.

[2]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[3]  Matteo Pellegrini,et al.  Prolinks: a database of protein functional linkages derived from coevolution , 2004, Genome Biology.

[4]  M. Pellegrini,et al.  Computational method to assign microbial genes to pathways , 2001, Journal of cellular biochemistry. Supplement.

[5]  Charles DeLisi,et al.  Identifying functional links between genes using conserved chromosomal proximity. , 2002, Trends in genetics : TIG.

[6]  Patrick Lambrix,et al.  Representations of molecular pathways: an evaluation of SBML, PSI MI and BioPAX , 2005, Bioinform..

[7]  Suzanne M. Paley,et al.  Integrated pathway/genome databases and their role in drug discovery , 1999 .

[8]  C. DeLisi,et al.  The society of genes: networks of functional links between genes from comparative genomics , 2002, Genome Biology.

[9]  E. Koonin,et al.  Evolution of gene fusions: horizontal transfer versus independent events , 2002, Genome Biology.

[10]  Mark D'Souza,et al.  Use of contiguity on the chromosome to predict functional coupling , 1998, Silico Biol..

[11]  C. Ouzounis,et al.  Expansion of the BioCyc collection of pathway/genome databases to 160 genomes , 2005, Nucleic acids research.

[12]  Jason A. Papin,et al.  Comparison of network-based pathway analysis methods. , 2004, Trends in biotechnology.

[13]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[14]  P. Bork,et al.  Genome evolution reveals biochemical networks and functional modules , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Priyanka Gupta,et al.  BioWarehouse: a bioinformatics database warehouse toolkit , 2006, BMC Bioinformatics.

[16]  Peter D. Karp,et al.  A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases , 2004, BMC Bioinformatics.

[17]  Steven C. Lawlor,et al.  GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways , 2002, Nature Genetics.

[18]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[19]  Joanne S. Luciano,et al.  PAX of mind for pathway researchers. , 2005, Drug discovery today.

[20]  T. Gaasterland,et al.  Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes. , 1998, Microbial & comparative genomics.

[21]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[22]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Simon Kasif,et al.  Identification of functional links between genes using phylogenetic profiles , 2003, Bioinform..

[24]  M. Huynen,et al.  Prediction of protein function and pathways in the genome era , 2004, Cellular and Molecular Life Sciences CMLS.

[25]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[26]  Peter D. Karp,et al.  The Pathway Tools software , 2002, ISMB.

[27]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[28]  A Finney,et al.  Systems biology markup language: Level 2 and beyond. , 2003, Biochemical Society transactions.

[29]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[30]  Richard J Roberts,et al.  Identifying Protein Function—A Call for Community Action , 2004, PLoS biology.

[31]  P. Karp Call for an enzyme genomics initiative , 2004, Genome Biology.