Expansion of tandem repeats and oligomer clustering in coding and noncoding DNA sequences

We review recent studies of distribution of dimeric tandem repeats and short oligomer clustering in DNA sequences. We find that distribution of dimeric tandem repeats in coding DNA is exponential, while in noncoding DNA it often has long power-law tails. We explain this phenomenon using mutation models based on random multiplicative processes. We also develop a clustering measure based on percolation theory that quantifies the degree of clustering of short oligomers. We find that mono-, di-, and tetramers cluster more in noncoding DNA than in coding DNA. However trimers have some degree of clustering in coding DNA and noncoding DNA. We relate this phenomena to modes of tandem repeat expansion.

[1]  Hanspeter Herzel,et al.  Interpreting correlations in biosequences , 1998 .

[2]  H. Stanley,et al.  Analysis of DNA sequences using methods of statistical physics , 1998 .

[3]  Wentian Li,et al.  The Study of Correlation Structures of DNA Sequences: A Critical Review , 1997, Comput. Chem..

[4]  H. Stanley,et al.  Discrete molecular dynamics studies of the folding of a protein-like model. , 1998, Folding & design.

[5]  Robert I. Richards,et al.  Simple repeat DNA is not replicated simply , 1994, Nature Genetics.

[6]  R. Stallings Distribution of trinucleotide microsatellites in different categories of mammalian genomic sequence: implications for human genetic diseases. , 1994, Genomics.

[7]  P. Lio’,et al.  Analysis of genomic patchiness of Haemophilus influenzae and Saccharomyces cerevisiae chromosomes. , 1996, Journal of theoretical biology.

[8]  Nikolay V. Dokholyan,et al.  Model of unequal chromosomal crossing over in DNA sequences 1 1 This work is supported by NIH-HGP. , 1998 .

[9]  E. Shakhnovich,et al.  Conserved residues and the mechanism of protein folding , 1996, Nature.

[10]  S. Havlin,et al.  Fractals and Disordered Systems , 1991 .

[11]  L A Mirny,et al.  Universality and diversity of the protein folding scenarios: a comprehensive analysis with the aid of a lattice model. , 1996, Folding & design.

[12]  K. Kinzler,et al.  Clues to the pathogenesis of familial colorectal cancer. , 1993, Science.

[13]  S N Thibodeau,et al.  Microsatellite instability in cancer of the proximal colon. , 1993, Science.

[14]  D. Sornette,et al.  Convergent Multiplicative Processes Repelled from Zero: Power Laws and Truncated Power Laws , 1996, cond-mat/9609074.

[15]  A K Konopka,et al.  Distance analysis helps to establish characteristic motifs in intron sequences. , 1987, Gene analysis techniques.

[16]  Darryl Shibata,et al.  Ubiquitous somatic mutations in simple repeated sequences reveal a new mechanism for colonic carcinogenesis , 1993, Nature.

[17]  L. Cavalli-Sforza,et al.  High resolution of human evolutionary trees with polymorphic microsatellites , 1994, Nature.

[18]  GEORGE I. BELL,et al.  Evolution of Simple Sequence Repeats , 1996, Comput. Chem..

[19]  R. Durrett,et al.  Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Wolfgang Stephan,et al.  The evolutionary dynamics of repetitive DNA in eukaryotes , 1994, Nature.

[21]  P. Gill,et al.  Human VNTR mutation and sex. , 1993, EXS.

[22]  S. Karlin,et al.  What drives codon choices in human genes? , 1996, Journal of molecular biology.

[23]  D. Arquès,et al.  Periodicities in introns. , 1987, Nucleic acids research.

[24]  T. Kunkel Slippery DNA and diseases , 1993, Nature.

[25]  Peter Reynolds,et al.  Ghost fields, pair connectedness, and scaling: exact results in one-dimensional percolation? , 1977 .

[26]  E I Shakhnovich,et al.  Impact of local and non-local interactions on thermodynamics and kinetics of protein folding. , 1995, Journal of molecular biology.

[27]  S. Redner,et al.  Introduction To Percolation Theory , 2018 .

[28]  S. Karlin,et al.  Over- and under-representation of short oligonucleotides in DNA sequences. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[29]  A. Bowcock,et al.  Genetic instability in human ovarian cancer cell lines. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[30]  E. Shakhnovich,et al.  Implications of thermodynamics of protein folding for evolution of primary sequences , 1990, Nature.

[31]  R I Richards,et al.  Simple tandem DNA repeats and human genetic disease. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[32]  J. Jurka,et al.  The Length Distribution of Perfect Dimer Repetitive DNA Is Consistent with Its Evolution by an Unbiased Single-Step Mutation Process , 1997, Journal of Molecular Evolution.

[33]  George I. Bell,et al.  Roles of Repetitive Sequences , 1992, Comput. Chem..

[34]  Wentian Li,et al.  Long-range correlation and partial 1/fα spectrum in a noncoding DNA sequence , 1992 .

[35]  S V Buldyrev,et al.  Quantification of DNA patchiness using long-range correlation measures. , 1997, Biophysical journal.

[36]  G. Gutman,et al.  Slipped-strand mispairing: a major mechanism for DNA sequence evolution. , 1987, Molecular biology and evolution.

[37]  S. Hess,et al.  Characteristics of the large (dA).(dT) homopolymer tracts in D. discoideum gene flanking and intron sequences. , 1993, Journal of biomolecular structure & dynamics.

[38]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[39]  S. Havlin,et al.  Clustering of identical oligomers in coding and noncoding DNA sequences. , 1999, Journal of biomolecular structure & dynamics.

[40]  S. S. Smith,et al.  Hairpins are formed by the single DNA strands of the fragile X triplet repeats: structure and biological implications. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[41]  R. Wells Molecular Basis of Genetic Instability of Triplet Repeats (*) , 1996, The Journal of Biological Chemistry.

[42]  N V Dokholyan,et al.  Distributions of dimeric tandem repeats in non-coding and coding DNA sequences. , 2000, Journal of theoretical biology.

[43]  Nikolay V. Dokholyan,et al.  Distribution of Base Pair Repeats in Coding and Noncoding DNA Sequences , 1997 .

[44]  J. Weber,et al.  Survey of human and rat microsatellites. , 1992, Genomics.

[45]  J. Mrazek,et al.  Middle-range clustering of nucleotides in genomes , 1995, Comput. Appl. Biosci..

[46]  C. Peng,et al.  Long-range correlations in nucleotide sequences , 1992, Nature.