Simple diagnostic statistical tests of models for DNA substitution

The accuracy of models for DNA substitution used in phylogenetic analyses is becoming more important with the increasing availability and analysis of molecular sequence data. It is natural to look for ways of improving these models, and to do this in a planned manner it is useful to be able to identify features of sequences that may not be described adequately. In this paper, I describe three statistics which may give useful diagnostic information on departures from models' predictions. The statistical distributions of these statistics are discussed and simple significance tests are derived. These tests are based on the (estimated) phylogeny of the sequences and so have the advantage of using the information contained in this tree. Examples are given of the application of the new tests to Markov chain models describing the evolution of primate pseudogene sequences and small-subunit RNA sequences.

[1]  D. Cox Tests of Separate Families of Hypotheses , 1961 .

[2]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[3]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[4]  J. Felsenstein Maximum-likelihood estimation of evolutionary trees from continuous characters. , 1973, American journal of human genetics.

[5]  Norman L. Johnson,et al.  Urn models and their application , 1977 .

[6]  M. Bishop,et al.  Evolutionary trees from nucleic acid and protein sequences , 1985, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[7]  Peilin Xu,et al.  Primate η-globin DNA sequences and man's place among the great apes , 1986, Nature.

[8]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[9]  H. Kishino,et al.  Estimation of branching dates among primates by molecular clocks of nuclear DNA which slowed down in Hominoidea , 1989 .

[10]  G. Pesole,et al.  Stochastic models of molecular evolution and the estimation of phylogeny and rates of nucleotide substitution in the hominoid primates , 1989 .

[11]  J H Gillespie,et al.  Lineage effects and the index of dispersion of molecular evolution. , 1989, Molecular biology and evolution.

[12]  J. A. Cavender,et al.  Mechanized derivation of linear invariants. , 1989, Molecular biology and evolution.

[13]  Nick Goldman,et al.  MAXIMUM LIKELIHOOD INFERENCE OF PHYLOGENETIC TREES, WITH SPECIAL REFERENCE TO A POISSON PROCESS MODEL OF DNA SUBSTITUTION AND TO PARSIMONY ANALYSES , 1990 .

[14]  G A Churchill,et al.  Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants. , 1991, Molecular biology and evolution.

[15]  D Penny,et al.  Progress with methods for constructing evolutionary trees. , 1992, Trends in ecology & evolution.

[16]  T. Jukes,et al.  The neutral theory of molecular evolution. , 2000, Genetics.

[17]  Walter M. Fitch,et al.  A method for estimating the number of invariant amino acid coding positions in a gene using cytochrome c as a model case , 1967, Biochemical Genetics.

[18]  W. Fitch,et al.  An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution , 1970, Biochemical Genetics.

[19]  Nick Goldman,et al.  Statistical tests of models of DNA substitution , 1993, Journal of Molecular Evolution.

[20]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[21]  S. Palumbi,et al.  Rates of molecular evolution and the fraction of nucleotide positions free to vary , 1989, Journal of Molecular Evolution.

[22]  Brian Golding,et al.  A maximum likelihood approach to the detection of selection from a phylogeny , 1990, Journal of Molecular Evolution.

[23]  H. Kishino,et al.  Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea , 1989, Journal of Molecular Evolution.

[24]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[25]  H. Kishino,et al.  Man's place in Hominoidea as inferred from molecular clocks of DNA , 2005, Journal of Molecular Evolution.

[26]  G. Serio,et al.  A new method for calculating evolutionary substitution rates , 2005, Journal of Molecular Evolution.