DNA Sequence Evolution with Neighbor-Dependent Mutation

We introduce a model of DNA sequence evolution which can account for biases in mutation rates that depend on the identity of the neighboring bases. An analytic solution for this class of models is developed by adopting well-known methods of nonlinear dynamics. Results are presented for the CpG-methylation-deamination process, which dominates point substitutions in vertebrates. The dinucleotide frequencies generated by the model (using empirically obtained mutation rates) match the overall pattern observed in noncoding DNA. A web-based tool has been constructed to compute single- and dinucleotide frequencies for arbitrary neighbor-dependent mutation rates. Also provided is the backward procedure to infer the mutation rates using maximum likelihood analysis given the observed single- and dinucleotide frequencies. Reasonable estimates of the mutation rates can be obtained very efficiently, using generic noncoding DNA sequences as input, after masking out long homonucleotide subsequences. Our method is much more convenient and versatile to use than the traditional method of deducing mutation rates by counting mutation events in carefully chosen sequences. More generally, our approach provides a more realistic but still tractable description of noncoding genomic DNA and may be used as a null model for various sequence analysis applications.

[1]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[2]  T. Shaikh,et al.  Structure and variability of recently inserted Alu family members. , 1990, Nucleic acids research.

[3]  C. Nusbaum,et al.  Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. , 1998, Science.

[4]  Ziheng Yang,et al.  Estimation of the Transition/Transversion Rate Bias and Species Sampling , 1999, Journal of Molecular Evolution.

[5]  S Karlin,et al.  Compositional differences within and between eukaryotic genomes. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[6]  E. Willery,et al.  Lack of functional complementation between Bordetella pertussis filamentous hemagglutinin and Proteus mirabilis HpmA hemolysin secretion machineries , 1997, Journal of bacteriology.

[7]  C. Gardiner Handbook of Stochastic Methods , 1983 .

[8]  Takashi Gojobori,et al.  Patterns of nucleotide substitution in pseudogenes and functional genes , 2005, Journal of Molecular Evolution.

[9]  G. Russell,et al.  Similarity of the general designs of protochordates and invertebrates , 1977, Nature.

[10]  Wen-Hsiung Li,et al.  Nonrandomness of point mutation as reflected in nucleotide substitutions in pseudogenes and its evolutionary implications , 2005, Journal of Molecular Evolution.

[11]  S T Hess,et al.  Wide variations in neighbor-dependent substitution rates. , 1994, Journal of molecular biology.

[12]  Philip J. Farabaugh,et al.  Molecular basis of base substitution hotspots in Escherichia coli , 1978, Nature.

[13]  D. Petrov,et al.  Patterns of nucleotide substitution in Drosophila and mammalian genomes. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[14]  K. J. Fryxell,et al.  Cytosine deamination plays a primary role in the evolution of mammalian isochores. , 2000, Molecular biology and evolution.

[15]  ben-Avraham,et al.  Mean-field (n,m)-cluster approximation for lattice models. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[16]  S. Gupta,et al.  Statistical decision theory and related topics IV , 1988 .

[17]  S. Karlin,et al.  Dinucleotide relative abundance extremes: a genomic signature. , 1995, Trends in genetics : TIG.

[18]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[19]  A. Riggs,et al.  DNA methylation and gene function. , 1980, Science.

[20]  R. Elton,et al.  Doublet frequency analysis of fractionated vertebrate nuclear DNA. , 1976, Journal of molecular biology.