Identifying clusters of functionally related genes in genomes

MOTIVATION An increasing body of literature shows that genomes of eukaryotes can contain clusters of functionally related genes. Most approaches to identify gene clusters utilize microarray data or metabolic pathway databases to find groups of genes on chromosomes that are linked by common attributes. A generalized method that can find gene clusters regardless of the mechanism of origin would provide researchers with an unbiased method for finding clusters and studying the evolutionary forces that give rise to them. RESULTS We present an algorithm to identify gene clusters in eukaryotic genomes that utilizes functional categories defined in graph-based vocabularies such as the Gene Ontology (GO). Clusters identified in this manner need only have a common function and are not constrained by gene expression or other properties. We tested the algorithm by analyzing genomes of a representative set of species. We identified species-specific variation in percentage of clustered genes as well as in properties of gene clusters including size distribution and functional annotation. These properties may be diagnostic of the evolutionary forces that lead to the formation of gene clusters. AVAILABILITY A software implementation of the algorithm and example output files are available at http://fcg.tamu.edu/C_Hunter/.

[1]  Kiyoshi Ito,et al.  Identification and Characterization of a Novel Biotin Biosynthesis Gene in Saccharomyces cerevisiae , 2005, Applied and Environmental Microbiology.

[2]  Thomas Blumenthal,et al.  Operons as a common form of chromosomal organization in C. elegans , 1994, Nature.

[3]  Claudio Scazzocchio,et al.  Operator derepressed mutations in the proline utilisation gene cluster of Aspergillus nidulans , 2004, Molecular and General Genetics MGG.

[4]  Simon Wong,et al.  Birth of a metabolic gene cluster in yeast by adaptive gene relocation , 2005, Nature Genetics.

[5]  G. DORIA,et al.  A Nocturnal Hymenoptera of the Genus Bombus , 1886, Nature.

[6]  T. Blumenthal Gene clusters and polycistronic transcription in eukaryotes , 1998, BioEssays : news and reviews in molecular, cellular and developmental biology.

[7]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[8]  E. Sonnhammer,et al.  Genomic gene clustering analysis of pathways in eukaryotes. , 2003, Genome research.

[9]  R. Prim Shortest connection networks and some generalizations , 1957 .

[10]  H. Arst,et al.  A gene cluster in Aspergillus nidulans with an internally located cis-acting regulatory region , 1975, Nature.

[11]  Antonis Rokas,et al.  Parallel inactivation of multiple GAL pathway genes and ecological diversification in yeasts. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  T. Cooper,et al.  Regulation of Allantoin Catabolism in Saccharomyces cerevisiae , 1996 .

[13]  Laurence D. Hurst,et al.  Evidence for co-evolution of gene order and recombination rate , 2003, Nature Genetics.

[14]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[15]  S. Teichmann,et al.  Genes Encoding Subunits of Stable Complexes Are Clustered on the Yeast Chromosomes , 2004, Genetics.

[16]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[17]  R. Punnett,et al.  The Genetical Theory of Natural Selection , 1930, Nature.

[18]  A. Delcher,et al.  Human, mouse, and rat genome large-scale rearrangements: stability versus speciation. , 2004, Genome research.

[19]  M. Nei,et al.  Genome evolution: Let's stick together , 2003, Heredity.

[20]  C. Pál,et al.  The evolutionary dynamics of eukaryotic gene order , 2004, Nature Reviews Genetics.

[21]  Enrique Herrero,et al.  Functional analysis of yeast gene families involved in metabolism of vitamins B1 and B6 , 2002, Yeast.

[22]  G. Churchill,et al.  Evidence of a Large-Scale Functional Organization of Mammalian Chromosomes , 2005, PLoS biology.

[23]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[24]  N. Keller,et al.  Metabolic pathway gene clusters in filamentous fungi. , 1997, Fungal genetics and biology : FG & B.

[25]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[26]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[27]  James H. Thomas,et al.  Analysis of Homologous Gene Clusters in Caenorhabditis elegans Reveals Striking Regional Cluster Domains , 2006, Genetics.

[28]  J. Spieth,et al.  Operons in C. elegans: Polycistronic mRNA precursors are processed by trans-splicing of SL2 to downstream coding regions , 1993, Cell.

[29]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .