Substantial Regional Variation in Substitution Rates in the Human Genome: Importance of GC Content, Gene Density, and Telomere-Specific Effects

This study presents the first global, 1-Mbp-level analysis of patterns of nucleotide substitutions along the human lineage. The study is based on the analysis of a large amount of repetitive elements deposited into the human genome since the mammalian radiation, yielding a number of results that would have been difficult to obtain using the more conventional comparative method of analysis. This analysis revealed substantial and consistent variability of rates of substitution, with the variability ranging up to twofold among different regions. The rates of substitutions of C or G nucleotides with A or T nucleotides vary much more sharply than the reverse rates, suggesting that much of that variation is due to differences in mutation rates rather than in the probabilities of fixation of C/G vs. A/T nucleotides across the genome. For all types of substitution we observe substantially more hotspots than coldspots, with hotspots showing substantial clustering over tens of Mbp’s. Our analysis revealed that GC-content of surrounding sequences is the best predictor of the rates of substitution. The pattern of substitution appears very different near telomeres compared to the rest of the genome and cannot be explained by the genome-wide correlations of the substitution rates with GC content or exon density. The telomere pattern of substitution is consistent with natural selection or biased gene conversion acting to increase the GC-content of the sequences that are within 10–15 Mbp away from the telomere.

[1]  H. Ellegren,et al.  Mutation rate variation in the mammalian genome. , 2003, Current opinion in genetics & development.

[2]  G. Bernardi,et al.  Correlations between isochores and chromosomal bands in the human genome. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[3]  William H. Press,et al.  Book-Review - Numerical Recipes in Pascal - the Art of Scientific Computing , 1989 .

[4]  Laurence D. Hurst,et al.  The evolution of isochores , 2001, Nature Reviews Genetics.

[5]  L. Duret,et al.  Recombination drives the evolution of GC-content in the human genome. , 2004, Molecular biology and evolution.

[6]  L. Duret,et al.  The covariation between TpA deficiency, CpG deficiency, and G+C content of human isochores is due to a mathematical artifact. , 2000, Molecular biology and evolution.

[7]  M. J. Box A Comparison of Several Current Optimization Methods, and the use of Transformations in Constrained Problems , 1966, Comput. J..

[8]  Araxi O. Urrutia,et al.  A unification of mosaic structures in the human genome. , 2003, Human molecular genetics.

[9]  R. Britten,et al.  Sources and evolution of human Alu repeated sequences. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[10]  D. Gudbjartsson,et al.  A high-resolution recombination map of the human genome , 2002, Nature Genetics.

[11]  L. Duret,et al.  Vanishing GC-rich isochores in mammalian genomes. , 2002, Genetics.

[12]  K. Castleman,et al.  Automatic karyotyping of quinacrine mustard stained human chromosomes. , 1971, Experimental cell research.

[13]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[14]  C. Walsh,et al.  Cytosine methylation and the ecology of intragenomic parasites. , 1997, Trends in genetics : TIG.

[15]  K. J. Fryxell,et al.  Cytosine deamination plays a primary role in the evolution of mammalian isochores. , 2000, Molecular biology and evolution.

[16]  G Bernardi,et al.  An analysis of the bovine genome by Cs2SO4-Ag density gradient centrifugation. , 1973, Journal of molecular biology.

[17]  Sudhir Kumar,et al.  Mutation rates in mammalian genomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Sudhir Kumar,et al.  Neutral substitutions occur at a faster rate in exons than in noncoding DNA in primate genomes. , 2003, Genome research.

[19]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[20]  G Bernardi,et al.  Isochores and the evolutionary genomics of vertebrates. , 2000, Gene.

[21]  S. Lowe,et al.  Genes and transposons are differentially methylated in plants, but not in mammals. , 2003, Genome research.

[22]  A. Riggs,et al.  DNA methylation and gene function. , 1980, Science.

[23]  G Bernardi,et al.  The distribution of genes in the human genome. , 1991, Gene.

[24]  T. Smith,et al.  A fundamental division in the Alu family of repeated sequences. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[25]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[26]  Terence Hwa,et al.  Distinct changes of genomic biases in nucleotide substitution at the time of Mammalian radiation. , 2003, Molecular biology and evolution.

[27]  Christopher B. Burge,et al.  DNA sequence evolution with neighbor-dependent mutation , 2001, RECOMB '02.

[28]  Peter F. Arndt,et al.  Identification and Measurement of Neigbor Dependent Nucleotide Substitution Processes , 2005, German Conference on Bioinformatics.

[29]  Philip J. Farabaugh,et al.  Molecular basis of base substitution hotspots in Escherichia coli , 1978, Nature.

[30]  S T Hess,et al.  Wide variations in neighbor-dependent substitution rates. , 1994, Journal of molecular biology.

[31]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[32]  J. Jurka Repbase update: a database and an electronic journal of repetitive elements. , 2000, Trends in genetics : TIG.

[33]  Hans Ellegren,et al.  Deterministic mutation rate variation in the human genome. , 2002, Genome research.

[34]  David Haussler,et al.  Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution. , 2003, Genome research.

[35]  V. Kapitonov,et al.  The age of Alu subfamilies , 2004, Journal of Molecular Evolution.

[36]  H. Kritzer Comparing Partial Rank Order Correlations From Contingency Table Data , 1980 .

[37]  David Haussler,et al.  Integration of the cytogenetic map with the draft human genome sequence. , 2003, Human molecular genetics.

[38]  D. Haussler,et al.  Integration of cytogenetic landmarks into the draft sequence of the human genome , 2001, Nature.

[39]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[40]  Martin J Lercher,et al.  Genomic regionality in rates of evolution is not explained by clustering of genes of comparable expression profile. , 2004, Genome research.