Evolution of Simple Sequence Repeats

Simple Sequence Repeats (SSRs) are common and frequently polymorphic in eukaryote DNA. Many are subject to high rates of length mutation in which a gain or loss of one repeat unit is most often observed. Can the observed abundances and their length distributions be explained as the result of an unbiased random walk, starting from some initial repeat length? In order to address this question, we have considered two models for an unbiased random walk on the integers, n (n0 < or = n). The first is a continuous time process (Birth and Death Model or BDM) in which the probability of a transition to n + 1 or n - 1 is lambda k, with k = n - n0 + 1 per unit time. The second is a discrete time model (Random Walk Model or RWM), in which a transition is made at each time step, either to n - 1 or to n + 1. In each case the walks start at length n0, with new walks being generated at a steady rate, S, the source rate, determined by a base substitution rate of mutation from neighboring sequences. Each walk terminates whenever n reaches n0 - 1 or at some time, T, which reflects the contamination of pure repeat sequences by other mutations that remove them from consideration, either because they fail to satisfy the criteria for repeat selection from some database or because they can no longer undergo efficient length mutations. For infinite T, the results are particularly simple for N(k), the expected number of repeats of length n = k + n0 - 1, being, for BDM, N(k) = S/k lambda, and for RWM, N(k) = 2S. In each case, there is a cut-off value of k for finite T, namely k = T lambda ln2 for BDM and k = 0.57 square root of T for RWM; for larger values of k, N(k) becomes rapidly smaller than the infinite time limit. We argue that these results may be compared with SSR length distributions averaged over many loci, but not for a particular locus, for which founder effects are important. For the data of Beckmann & Weber [(1992), Genomics 12, 627] on GT.AC repeats in the human, each model gives a reasonable fit to the data, with the source at two repeat units (n0 = 2). Both the absolute number of loci and their length distribution are well represented.

[1]  Jian Yu,et al.  Studying human mutations by sperm typing: instability of CAG trinucleotide repeats in the human androgen receptor gene , 1994, Nature Genetics.

[2]  R. Harding,et al.  The evolution of tandemly repetitive DNA: recombination rules. , 1992, Genetics.

[3]  Manish S. Shah,et al.  A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes , 1993, Cell.

[4]  D. Tautz,et al.  Cryptic simplicity in DNA is a major source of genetic variation , 1986, Nature.

[5]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[6]  B. Stillman,et al.  Anatomy of a DNA replication fork revealed by reconstitution of SV40 DNA replication in vitro , 1994, Nature.

[7]  G. Riggins,et al.  Human genes containing polymorphic trinucleotide repeats , 1992, Nature Genetics.

[8]  J. Weber,et al.  Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. , 1989, American journal of human genetics.

[9]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[10]  S. Karlin,et al.  A second course in stochastic processes , 1981 .

[11]  R. Britten,et al.  Rates of DNA sequence evolution differ between taxonomic groups. , 1986, Science.

[12]  Wolfgang Stephan,et al.  The evolutionary dynamics of repetitive DNA in eukaryotes , 1994, Nature.

[13]  R. Fleischmann,et al.  Mutation of a mutL homolog in hereditary colon cancer. , 1994, Science.

[14]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[15]  Robert I. Richards,et al.  Simple repeat DNA is not replicated simply , 1994, Nature Genetics.

[16]  J. Weber Informativeness of human (dC-dA)n.(dG-dT)n polymorphisms. , 1990, Genomics.

[17]  C. Wehrhahn The evolution of selectively similar electrophoretically detectable alleles in finite natural populations. , 1975, Genetics.

[18]  N. Freimer,et al.  Allele frequencies at microsatellite loci: the stepwise mutation model revisited. , 1993, Genetics.

[19]  D. Shibata,et al.  Genomic instability in repeated sequences is an early somatic event in colorectal tumorigenesis that persists after transformation , 1994, Nature Genetics.

[20]  E. Boerwinkle,et al.  VNTR allele frequency distributions under the stepwise mutation model: a computer simulation approach. , 1993, Genetics.

[21]  David C. Torney,et al.  Repetitive DNA Sequences: Some Considerations for Simple Sequence Repeats , 1993, Comput. Chem..

[22]  J. Weber,et al.  Survey of human and rat microsatellites. , 1992, Genomics.

[23]  T. Petes,et al.  Instability of simple sequence repeats in a mammalian cell line. , 1994, Human molecular genetics.

[24]  A. Rich,et al.  (dC‐dA)n.(dG‐dT)n sequences have evolutionarily conserved chromosomal locations in Drosophila with implications for roles in chromosome structure and function. , 1987, The EMBO journal.

[25]  R. Stallings Distribution of trinucleotide microsatellites in different categories of mammalian genomic sequence: implications for human genetic diseases. , 1994, Genomics.

[26]  Jody Hey,et al.  Principles of population genetics (2nd edn) , 1989 .

[27]  D. Tautz,et al.  Slippage synthesis of simple sequence DNA. , 1992, Nucleic acids research.

[28]  D. Hartl,et al.  Principles of population genetics , 1981 .

[29]  Stephen Wolfram,et al.  Mathematica: a system for doing mathematics by computer (2nd ed.) , 1991 .

[30]  D. Ward,et al.  Mutation in the DNA mismatch repair gene homologue hMLH 1 is associated with hereditary non-polyposis colon cancer , 1994, Nature.

[31]  Michael Wester Mathematics: A System for Doing Mathematics by Computer, Second Edition (Stephen Wolfram) , 1992, SIAM Rev..

[32]  Tomas A. Prolla,et al.  Destabilization of tracts of simple repetitive DNA in yeast by mutations affecting DNA mismatch repair , 1993, Nature.

[33]  M. McInnis,et al.  Novel triplet repeat containing genes in human brain: cloning, expression, and length polymorphisms. , 1993, Genomics.

[34]  G. Gutman,et al.  Slipped-strand mispairing: a major mechanism for DNA sequence evolution. , 1987, Molecular biology and evolution.

[35]  H. Hamada,et al.  Enhanced gene expression by the poly(dT-dG).poly(dC-dA) sequence , 1984, Molecular and cellular biology.

[36]  C. E. Hildebrand,et al.  Evolution and distribution of (GT)n repetitive sequences in mammalian genomes. , 1991, Genomics.

[37]  J. Mandel Trinucleotide diseases on the rise , 1994, Nature Genetics.

[38]  W P Wahls,et al.  The Z-DNA motif d(TG)30 promotes reception of information during gene conversion events while stimulating homologous recombination in human cells in culture , 1990, Molecular and cellular biology.