On the dispersion index of a Markovian molecular clock.

The number of nucleotide substitutions accumulated in a gene or in a lineage is an important random variable in the study of molecular evolution. Of particular interest is the ratio of the variance to the mean of that random variable, often known as the dispersion index. Because nucleotide substitution is most commonly modeled by a continuous-time four-state Markov chain, this paper provides a systematic method of computing the dispersion indices exhibited by a continuous-time four-state Markov chain. Using this method along with computer algebra and Monte Carlo simulation, this paper offers partially proven conjectures that were supported by thorough computer experiments. It is believed that the Tamura model, the equal-input model and the Takahata-Kimura model always exhibit dispersion indices less than 2. It is also believed that a general four-state model can be chosen to exhibit a dispersion index of any desired magnitude, although the chance of a randomly chosen such model exhibiting a dispersion index greater than 2 is as small as about 2%. Relevance of these findings to the neutral theory is discussed.

[1]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[2]  A. Strzebonski Solving Algebraic Inequalities , 2000 .

[3]  W. Li,et al.  A general additive distance with time-reversibility and rate variation among nucleotide sites. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[4]  M. Kimura,et al.  The neutral theory of molecular evolution. , 1983, Scientific American.

[5]  D. Cutler The index of dispersion of molecular evolution: slow fluctuations. , 2000, Theoretical population biology.

[6]  M. Nei Molecular Evolutionary Genetics , 1987 .

[7]  Stephen Wolfram,et al.  The Mathematica Book , 1996 .

[8]  M. Bulmer,et al.  Synonymous nucleotide substitution rates in mammalian genes: implications for the molecular clock and the relationship of mammalian orders. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[9]  N. Takahata Statistical models of the overdispersed molecular clock. , 1991, Theoretical population biology.

[10]  Michael S. Waterman,et al.  Introduction to Computational Biology: Maps, Sequences and Genomes , 1998 .

[11]  D. Hartl,et al.  Principles of population genetics , 1981 .

[12]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[13]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[14]  D. Cutler,et al.  Understanding the overdispersed molecular clock. , 2000, Genetics.

[15]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[16]  K. Tamura,et al.  Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G+C-content biases. , 1992, Molecular biology and evolution.

[17]  H. M. Taylor,et al.  An introduction to stochastic modeling , 1985 .

[18]  M. Kimura Estimation of evolutionary distances between homologous nucleotide sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[19]  M. Kimura Evolutionary Rate at the Molecular Level , 1968, Nature.

[20]  N. Sueoka Directional mutation pressure and neutral molecular evolution. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[21]  M. Steel,et al.  Modeling the covarion hypothesis of nucleotide substitution. , 1998, Mathematical biosciences.

[22]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[23]  Samuel Karlin,et al.  A First Course on Stochastic Processes , 1968 .

[24]  Andreas D. Baxevanis,et al.  Bioinformatics - a practical guide to the analysis of genes and proteins , 2001, Methods of biochemical analysis.

[25]  Robert M. Miura,et al.  Some mathematical questions in biology : DNA sequence analysis , 1986 .

[26]  J. Rice,et al.  Modeling nucleotide evolution: a heterogeneous rate analysis. , 1996, Mathematical biosciences.

[27]  Bruce J. West,et al.  Molecular Evolution Modeled as a Fractal Renewal Point Process in Agreement with the Dispersion of Substitutions in Mammalian Genes , 1998, Journal of Molecular Evolution.

[28]  M. Kimura,et al.  A model of evolutionary base substitutions and its application with special reference to rapid change of pseudogenes. , 1981, Genetics.

[29]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[30]  M. Bulmer,et al.  Estimating the variability of substitution rates. , 1989, Genetics.