Prediction of functional modules based on comparative genome analysis and Gene Ontology application

We present a computational method for the prediction of functional modules encoded in microbial genomes. In this work, we have also developed a formal measure to quantify the degree of consistency between the predicted and the known modules, and have carried out statistical significance analysis of consistency measures. We first evaluate the functional relationship between two genes from three different perspectives—phylogenetic profile analysis, gene neighborhood analysis and Gene Ontology assignments. We then combine the three different sources of information in the framework of Bayesian inference, and we use the combined information to measure the strength of gene functional relationship. Finally, we apply a threshold-based method to predict functional modules. By applying this method to Escherichia coli K12, we have predicted 185 functional modules. Our predictions are highly consistent with the previously known functional modules in E.coli. The application results have demonstrated that our approach is highly promising for the prediction of functional modules encoded in a microbial genome.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  P. Bork,et al.  Genome evolution reveals biochemical networks and functional modules , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Hengliang Wang,et al.  [Microbial functional genomics]. , 2003, Wei sheng wu xue bao = Acta microbiologica Sinica.

[5]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[6]  Rolf Wagner,et al.  Transcription Regulation in Prokaryotes , 2000 .

[7]  Shailesh V. Date,et al.  A Probabilistic Functional Network of Yeast Genes , 2004, Science.

[8]  Alyssa C. Bumbaugh,et al.  Inferences from whole-genome sequences of bacterial pathogens. , 2002, Current opinion in genetics & development.

[9]  Peter D. Karp,et al.  EcoCyc: Encyclopedia of Escherichia coli genes and metabolism , 1998, Nucleic Acids Res..

[10]  G. Sumara,et al.  A Probabilistic Functional Network of Yeast Genes , 2004 .

[11]  G. Church,et al.  Predicting regulons and their cis-regulatory motifs by comparative genomics. , 2000, Nucleic acids research.

[12]  E. Koonin,et al.  Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. , 2000, Nucleic acids research.

[13]  William H. Press,et al.  Book-Review - Numerical Recipes in Pascal - the Art of Scientific Computing , 1989 .

[14]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[15]  Dong Xu,et al.  Biological knowledge discovery through mining multiple sources of high-throughput data , 2004 .

[16]  Tao Jiang,et al.  Operon prediction by comparative genomics: an application to the Synechococcus sp. WH8102 genome. , 2004, Nucleic acids research.

[17]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  F. Müller,et al.  Search for enhancers: teleost models in comparative genomic and transgenic analysis of cis regulatory elements. , 2002, BioEssays : news and reviews in molecular, cellular and developmental biology.

[19]  C. DeLisi,et al.  The society of genes: networks of functional links between genes from comparative genomics , 2002, Genome Biology.

[20]  E. Gilles,et al.  The organization of metabolic reaction networks: a signal-oriented approach to cellular models. , 2000, Metabolic engineering.

[21]  D. P. Wall,et al.  Detecting putative orthologs , 2003, Bioinform..

[22]  Susumu Goto,et al.  Extraction of phylogenetic network modules from prokayrote metabolic pathways. , 2004, Genome informatics. International Conference on Genome Informatics.

[23]  E. Koonin,et al.  Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. , 2001, Genome research.

[24]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[26]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[27]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[28]  Peter N. Campbell,et al.  Biochemistry (2nd edn) , 1995 .

[29]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[30]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[31]  Julio Collado-Vides,et al.  A powerful non-homology method for the prediction of operons in prokaryotes , 2002, ISMB.

[32]  D. Barrell,et al.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. , 2003, Genome research.

[33]  S Falkow,et al.  Microbial pathogenesis: genomics and beyond. , 1997, Science.

[34]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[35]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[36]  Peter D. Karp,et al.  Eco Cyc: encyclopedia of Escherichia coli genes and metabolism , 1999, Nucleic Acids Res..