Distribution of Base Pair Repeats in Coding and Noncoding DNA Sequences

We analyze the histograms for the lengths of the 16 possible distinct repeats of identical dimers, known as dimeric tandem repeats, in DNA sequences. For coding regions, the probability of finding a repetitive sequence of , copies of a particular dimer decreases exponentially as , increases. For the noncoding regions, the distribution functions for most of the 16 dimers have long tails and can be approximated by power-law functions, while for coding DNA, they can be well fit by a firstorder Markov process. We propose a model, based on known biophysical processes, which leads to the observed probability distribution functions for noncoding DNA. We argue that this difference in the shape of the distribution functions between coding and noncoding DNA arises from the fact that noncoding DNA is more tolerant to evolutionary mutational alterations than coding DNA. [S0031-9007(97)04907-7]

[1]  Kevin Hadduck A.M , 1996 .

[2]  L. M. M.-T. Theory of Probability , 1929, Nature.

[3]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[4]  Nature Genetics , 1991, Nature.

[5]  George Sugihara,et al.  Fractals in science , 1995 .