Connected gene neighborhoods in prokaryotic genomes.

A computational method was developed for delineating connected gene neighborhoods in bacterial and archaeal genomes. These gene neighborhoods are not typically present, in their entirety, in any single genome, but are held together by overlapping, partially conserved gene arrays. The procedure was applied to comparing the orders of orthologous genes, which were extracted from the database of Clusters of Orthologous Groups of proteins (COGs), in 31 prokaryotic genomes and resulted in the identification of 188 clusters of gene arrays, which included 1001 of 2890 COGs. These clusters were projected onto actual genomes to produce extended neighborhoods including additional genes, which are adjacent to the genes from the clusters and are transcribed in the same direction, which resulted in a total of 2387 COGs being included in the neighborhoods. Most of the neighborhoods consist predominantly of genes united by a coherent functional theme, but also include a minority of genes without an obvious functional connection to the main theme. We hypothesize that although some of the latter genes might have unsuspected roles, others are maintained within gene arrays because of the advantage of expression at a level that is typical of the given neighborhood. We designate this phenomenon 'genomic hitchhiking'. The largest neighborhood includes 79 genes (COGs) and consists of overlapping, rearranged ribosomal protein superoperons; apparent genome hitchhiking is particularly typical of this neighborhood and other neighborhoods that consist of genes coding for translation machinery components. Several neighborhoods involve previously undetected connections between genes, allowing new functional predictions. Gene neighborhoods appear to evolve via complex rearrangement, with different combinations of genes from a neighborhood fixed in different lineages.

[1]  Michael Y. Galperin,et al.  Who's your neighbor? New computational approaches for functional genomics , 2000, Nature Biotechnology.

[2]  A. Lesk COMPUTATIONAL MOLECULAR BIOLOGY , 1988, Proceeding of Data For Discovery.

[3]  H. Mewes,et al.  SNAPping up functionally related genes based on context information: a colinearity-free approach. , 2001, Journal of molecular biology.

[4]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[5]  B. Snel,et al.  Gene and context: integrative approaches to genome analysis. , 2000, Advances in protein chemistry.

[6]  J. Gross,et al.  Graph Theory and Its Applications , 1998 .

[7]  Mark D'Souza,et al.  Use of contiguity on the chromosome to predict functional coupling , 1998, Silico Biol..

[8]  J. Lawrence,et al.  Selfish operons: the evolutionary impact of gene clustering in prokaryotes and eukaryotes. , 1999, Current opinion in genetics & development.

[9]  E V Koonin,et al.  Prediction of the archaeal exosome and its connections with the proteasome and the translation and transcription machineries by a comparative-genomic approach. , 2001, Genome research.

[10]  N. Grishin,et al.  Genome trees constructed using five different approaches suggest new major bacterial clades , 2001, BMC Evolutionary Biology.

[11]  S. Salzberg,et al.  Prediction of operons in microbial genomes. , 2001, Nucleic acids research.

[12]  B. Snel,et al.  STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. , 2000, Nucleic acids research.

[13]  P Bork,et al.  Exploitation of gene context. , 2000, Current opinion in structural biology.

[14]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.

[15]  P Bork,et al.  Gene context conservation of a higher order than operons. , 2000, Trends in biochemical sciences.

[16]  Tatiana A. Tatusova,et al.  Complete genomes in WWW Entrez: data representation and analysis , 1999, Bioinform..

[17]  P. Bork,et al.  Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli , 1996, Current Biology.

[18]  Temple F. Smith,et al.  Operons in Escherichia coli: genomic analyses and predictions. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[19]  M. Inouye,et al.  Era, an essential Escherichia coli small G-protein, binds to the 30S ribosomal subunit. , 1999, Biochemical and biophysical research communications.

[20]  J. Monod,et al.  Genetic regulatory mechanisms in the synthesis of proteins. , 1961, Journal of Molecular Biology.

[21]  Michael Y. Galperin,et al.  The COG database: new developments in phylogenetic classification of proteins from complete genomes , 2001, Nucleic Acids Res..

[22]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[23]  Nick V Grishin,et al.  A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis. , 2002, Nucleic acids research.

[24]  H. Kersten,et al.  Structure and organization of Escherichia coli genes involved in biosynthesis of the deazaguanine derivative queuine, a nutrient factor for eukaryotes , 1991, Journal of bacteriology.

[25]  T. Baker,et al.  A specificity-enhancing factor for the ClpXP degradation machine. , 2000, Science.

[26]  B. Luisi,et al.  Crystal structure of the Escherichia coli RNA degradosome component enolase. , 2001, Journal of molecular biology.

[27]  J R Roth,et al.  Selfish operons: horizontal transfer may drive the evolution of gene clusters. , 1996, Genetics.

[28]  M. Kanehisa,et al.  Automatic detection of conserved gene clusters in multiple genomes by graph comparison and P-quasi grouping. , 2000, Nucleic acids research.

[29]  C. Higgins,et al.  A DEAD-box RNA helicase in the Escherichia coli RNA degradosome , 1996, Nature.

[30]  E V Koonin,et al.  Gene order is not conserved in bacterial evolution. , 1996, Trends in genetics : TIG.

[31]  J. Lupski,et al.  Dna → DNA, and DNA → RNA → protein: Orchestration by a single complex operon , 1989, BioEssays : news and reviews in molecular, cellular and developmental biology.

[32]  E. Koonin,et al.  Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. , 2001, Genome research.

[33]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.