Prediction of functional modules based on gene distributions in microbial genomes.

We present a computational method for prediction of functional modules that can be directly applied to the newly sequenced microbial genomes for predicting gene functions and the component genes of biological pathways. We first quantify the functional relatedness among genes based on their distribution (i.e., their existences and orders) across multiple microbial genomes, and obtain a gene network in which every pair of genes is associated with a score representing their functional relatedness. We then apply a threshold-based clustering algorithm to this gene network, and obtain modules for each of which the number of genes is bounded from above by a pre-specified value and the component genes are more strongly functionally related to each other than genes across the predicted modules. Particularly, when the module size is bounded by 130, we obtain 167 functional modules covering 813 genes for Escherichia coli K12, and 138 functional modules covering 731 genes for Bacillus subtilis subsp. subtilis str. 168. We have used the gene ontology (GO) information to assess the prediction results. The GO similarities among the genes of the same functional module are compared with the GO similarities among the genes that are randomly clustered together. This comparison reveals that our predicted functional modules are statistically and biologically significant, and the genes of the same functional module share more commonality in terms of biological process than in terms of molecular function or cellular component. We have also examined the predicted functional modules that are common to both Escherichia coli K12 and Bacillus subtilis subsp. subtilis str. 168, and provide explanations for some functional modules.

[1]  Ying Xu,et al.  Prediction of functional modules based on comparative genome analysis and Gene Ontology application , 2005, Nucleic acids research.

[2]  E. Gilles,et al.  The organization of metabolic reaction networks: a signal-oriented approach to cellular models. , 2000, Metabolic engineering.

[3]  S. Levy,et al.  Phylogeny of metabolic pathways: O‐acetylserine sulphydrylase A is homologous to the tryptophan synthase beta subunit , 1988, Molecular microbiology.

[4]  S. Salzberg,et al.  Prediction of operons in microbial genomes. , 2001, Nucleic acids research.

[5]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[6]  Rolf Wagner,et al.  Transcription Regulation in Prokaryotes , 2000 .

[7]  Eugene V Koonin,et al.  Connected gene neighborhoods in prokaryotic genomes. , 2002, Nucleic acids research.

[8]  R. Bourret,et al.  Protein phosphorylation in the bacterial chemotaxis system. , 1989, Biochimie.

[9]  Temple F. Smith,et al.  Operons in Escherichia coli: genomic analyses and predictions. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  P Bork,et al.  Gene context conservation of a higher order than operons. , 2000, Trends in biochemical sciences.

[11]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[12]  Y. Nakamura,et al.  Chromosomal location and structure of the operon encoding peptide-chain-release factor 2 of Escherichia coli. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[13]  G. Stephanopoulos,et al.  Metabolic Engineering: Principles And Methodologies , 1998 .

[14]  Mathieu Raffinot,et al.  Gene teams: a new formalization of gene clusters for comparative genomics , 2003, Comput. Biol. Chem..

[15]  Tao Jiang,et al.  Operon prediction by comparative genomics: an application to the Synechococcus sp. WH8102 genome. , 2004, Nucleic acids research.

[16]  J. S. Parkinson Protein phosphorylation in bacterial chemotaxis , 1988, Cell.

[17]  Dorothea K. Thompson,et al.  Microbial Functional Genomics: Zhou/Microbial Functional Genomics , 2005 .

[18]  D. P. Wall,et al.  Detecting putative orthologs , 2003, Bioinform..

[19]  Jeremy Buhler,et al.  Operon prediction without a training set , 2005, Bioinform..

[20]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[21]  Jiong Yang,et al.  Gene teams with relaxed proximity constraint , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[22]  P. Dessen,et al.  Homology of lysS and lysU, the two Escherichia coli genes encoding distinct lysyl-tRNA synthetase species. , 1990, Nucleic acids research.

[23]  F. Dahlquist,et al.  Amplification of Signaling Events in Bacteria , 2002, Science's STKE.

[24]  D. Barrell,et al.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. , 2003, Genome research.