Clustering of housekeeping genes provides a unified model of gene order in the human genome

It is often supposed that, except for tandem duplicates, genes are randomly distributed throughout the human genome. However, recent analyses suggest that when all the genes expressed in a given tissue (notably placenta and skeletal muscle) are examined, these genes do not map to random locations but instead resolve to clusters. We have asked three questions: (i) is this clustering true for most tissues, or are these the exceptions; (ii) is any clustering simply the result of the expression of tandem duplicates and (iii) how, if at all, does this relate to the observed clustering of genes with high expression rates? We provide a unified model of gene clustering that explains the previous observations. We examined Serial Analysis of Gene Expression (SAGE) data for 14 tissues and found significant clustering, in each tissue, that persists even after the removal of tandem duplicates. We confirmed clustering by analysis of independent expressed-sequence tag (EST) data. We then tested the possibility that the human genome is organized into subregions, each specializing in genes needed in a given tissue. By comparing genes expressed in different tissues, we show that this is not the case: those genes that seem to be tissue-specific in their expression do not, as a rule, cluster. We report that genes that are expressed in most tissues (housekeeping genes) show strong clustering. In addition, we show that the apparent clustering of genes with high expression rates is a consequence of the clustering of housekeeping genes.

[1]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[2]  G. Lanfranchi,et al.  A comprehensive, high-resolution genomic transcript map of human skeletal muscle. , 1998, Genome research.

[3]  T. Moore,et al.  Imprinted genes have few and small introns , 1996, Nature Genetics.

[4]  T. Blumenthal Gene clusters and polycistronic transcription in eukaryotes , 1998, BioEssays : news and reviews in molecular, cellular and developmental biology.

[5]  J. Todd,et al.  Major factors influencing linkage disequilibrium by analysis of different chromosome regions in distinct populations: demography, chromosome recombination frequency and selection. , 2000, Human molecular genetics.

[6]  F. Baas,et al.  The Human Transcriptome Map: Clustering of Highly Expressed Genes in Chromosomal Domains , 2001, Science.

[7]  P. Deloukas,et al.  A Gene Map of the Human Genome , 1996, Science.

[8]  S. Altschul,et al.  SAGEmap: a public gene expression resource. , 2000, Genome research.

[9]  R. Ferl,et al.  Higher-order chromatin structure: looping long molecules , 1999, Plant Molecular Biology.

[10]  J. Eppig,et al.  Genome-wide mapping of unselected transcripts from extraembryonic tissue of 7.5-day mouse embryos reveals enrichment in the t-complex and under-representation on the X chromosome. , 1998, Human molecular genetics.

[11]  Gen Tamiya,et al.  Complete sequence and gene map of a human major histocompatibility complex , 1999 .

[12]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[13]  R. A. Fisher,et al.  The Genetical Theory of Natural Selection , 1931 .

[14]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.

[15]  M. Gouy,et al.  HOVERGEN: a database of homologous vertebrate genes. , 1994, Nucleic acids research.