A stochastic model for the evolution of autocorrelated DNA sequences.

Currently used stochastic models of DNA sequence evolution assume independent and identically distributed nucleotide sites. They are too simple to account for dependence structures obviously present in molecular data. Up to now more realistic stochastic models for nucleotide substitutions have been considered intractable. In this paper a procedure that accounts for non-overlapping correlations among pairs of sites of a DNA sequence is developed. We show that currently used models that ignore correlated sites underestimate distances inferred from observed sequence dissimilarities. For the analyzed mitochondrial sequence data this underestimation is not drastic in contrast to paired regions (stems) of bacterial 23S rRNA sequences.