Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance.

Mammalian chromosomes are characterized by large-scale variations of DNA base composition (the so-called isochores). In contradiction with previous studies, Lercher et al. (Hum. Mol. Genet., 12, 2411, 2003) recently reported a strong correlation between gene expression breadth and GC-content, suggesting that there might be a selective pressure favoring the concentration of housekeeping genes in GC-rich isochores. We reassessed this issue by examining in human and mouse the correlation between gene expression and GC-content, using different measures of gene expression (EST, SAGE and microarray) and different measures of GC-content. We show that correlations between GC-content and expression are very weak, and may vary according to the method used to measure expression. Such weak correlations have a very low predictive value. The strong correlations reported by Lercher et al. (2003) are because of the fact that they measured variables over neighboring genes windows. We show here that using gene windows artificially enhances the correlation. The assertion that the expression of a given gene depends on the GC-content of the region where it is located is therefore not supported by the data.

[1]  A. Smit Interspersed repeats and other mementos of transposable elements in mammalian genomes. , 1999, Current opinion in genetics & development.

[2]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[3]  G Bernardi,et al.  Isochores and the evolutionary genomics of vertebrates. , 2000, Gene.

[4]  M. Gouy,et al.  HOVERGEN: a database of homologous vertebrate genes. , 1994, Nucleic acids research.

[5]  F. Baas,et al.  The Human Transcriptome Map: Clustering of Highly Expressed Genes in Chromosomal Domains , 2001, Science.

[6]  A. Orth,et al.  Large-scale analysis of the human and mouse transcriptomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  T. Liesegang The human transcriptome map: Clustering of highly expressed genes in chromosomal domains. Caron H, ∗ van Schaik B, van der Mee M, et al. Science 2001;291:1289–1292. , 2001 .

[8]  Laurence D. Hurst,et al.  The evolution of isochores , 2001, Nature Reviews Genetics.

[9]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[10]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[11]  Simon C. Potter,et al.  An overview of Ensembl. , 2004, Genome research.

[12]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence Project: update and current status , 2003, Nucleic Acids Res..

[13]  H. Bussemaker,et al.  The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. , 2003, Genome research.

[14]  L. Duret,et al.  Determinants of CpG islands: expression in early embryo and isochore structure. , 2001, Genome research.

[15]  G Bernardi,et al.  The mosaic genome of warm-blooded vertebrates. , 1985, Science.

[16]  Kamel Jabbari,et al.  The major shifts of human duplicated genes. , 2003, Gene.

[17]  Laurent Duret,et al.  Evolution of synonymous codon usage in metazoans. , 2002, Current opinion in genetics & development.

[18]  Peng Liang,et al.  SAGE Genie: A suite with panoramic view of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  D. Gudbjartsson,et al.  A high-resolution recombination map of the human genome , 2002, Nature Genetics.

[20]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[21]  L. Duret,et al.  Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores , 1995, Journal of Molecular Evolution.

[22]  L. Duret,et al.  Nature and structure of human genes that generate retropseudogenes. , 2000, Genome research.

[23]  M. Hattori,et al.  Chromosome-wide assessment of replication timing for human chromosomes 11q and 21q: disease-related genes in timing-switch regions. , 2002, Human molecular genetics.

[24]  Araxi O. Urrutia,et al.  The signature of selection mediated by expression on human genes. , 2003, Genome research.

[25]  G Bernardi,et al.  The distribution of genes in the human genome. , 1991, Gene.

[26]  Alexander E Vinogradov,et al.  Isochores and tissue-specificity. , 2003, Nucleic acids research.

[27]  Martin J. Lercher,et al.  Clustering of housekeeping genes provides a unified model of gene order in the human genome , 2002, Nature Genetics.

[28]  L. Duret,et al.  GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. , 2001, Genetics.

[29]  Araxi O. Urrutia,et al.  A unification of mosaic structures in the human genome. , 2003, Human molecular genetics.