Statistical tests of models of DNA substitution

SummaryPenny et al. have written that “The most fundamental criterion for a scientific method is that the data must, in principle, be able to reject the model. Hardly any [phylogenetic] tree-reconstruction methods meet this simple requirement.” The ability to reject models is of such great importance because the results of all phylogenetic analyses depend on their underlying models—to have confidence in the inferences, it is necessary to have confidence in the models. In this paper, a test statistics suggested by Cox is employed to test the adequacy of some statistical models of DNA sequence evolution used in the phylogenetic inference method introduced by Felsentein. Monte Carlo simulations are used to assess significance levels. The resulting statistical tests provide an objective and very general assessment of all the components of a DNA substitution model; more specific versions of the test are devised to test individual components of a model. In all cases, the new analyses have the additional advantage that values of phylogenetic parameters do not have to be assumed in order to perform the tests.

[1]  D A Williams Discrimination between regression models to determine the pattern of enzyme synthesis in synchronous cell cultures. , 1970, Biometrics.

[2]  A Gajdos,et al.  [Evolution of protein molecules. I. Protein synthesis]. , 1972, La Nouvelle presse medicale.

[3]  D Penny,et al.  Progress with methods for constructing evolutionary trees. , 1992, Trends in ecology & evolution.

[4]  G. Churchill Stochastic models for heterogeneous DNA sequences. , 1989, Bulletin of mathematical biology.

[5]  T. Jukes,et al.  The neutral theory of molecular evolution. , 2000, Genetics.

[6]  D. Penny Towards a basis for classification: the incompleteness of distance measures, incompatibility analysis and phenetic classification. , 1982, Journal of theoretical biology.

[7]  Susan R. Wilson,et al.  Two guidelines for bootstrap hypothesis testing , 1991 .

[8]  J. K. Lindsey,et al.  Comparison of Probability Distributions , 1974 .

[9]  G. Pesole,et al.  Stochastic models of molecular evolution and the estimation of phylogeny and rates of nucleotide substitution in the hominoid primates , 1989 .

[10]  J H Gillespie,et al.  Lineage effects and the index of dispersion of molecular evolution. , 1989, Molecular biology and evolution.

[11]  Eric R. Ziegel,et al.  Statistical Theory and Data Analysis II , 1990 .

[12]  H Kishino,et al.  Converting distance to time: application to human evolution. , 1990, Methods in enzymology.

[13]  J A Lake,et al.  A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. , 1987, Molecular biology and evolution.

[14]  G. Pesole,et al.  Glutamine synthetase gene evolution: a good molecular clock. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Joseph Felsenstein,et al.  Statistical inference of phylogenies , 1983 .

[16]  N E Manos,et al.  Stochastic Models , 1960, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[17]  Joseph Felsenstein,et al.  Maximum Likelihood and Minimum-Steps Methods for Estimating Evolutionary Trees from Data on Discrete Characters , 1973 .

[18]  R L Kashyap,et al.  Statistical estimation of parameters in a phylogenetic tree using a dynamic model of the substitutional process. , 1974, Journal of theoretical biology.

[19]  José L. Oliver,et al.  SDSE: a software package to simulate the evolution of a pair of DNA sequences , 1989, Comput. Appl. Biosci..

[20]  M. Goodman,et al.  Molecular Evolution of the vq-Globin Gene Locus : Gibbon Phylogeny and the Hominoid Slowdown ’ , 1998 .

[21]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[22]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[23]  G. Serio,et al.  A new method for calculating evolutionary substitution rates , 2005, Journal of Molecular Evolution.

[24]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[25]  P. Avery,et al.  The analysis of intron data and their use in the detection of short signals , 2005, Journal of Molecular Evolution.

[26]  Williams Da,et al.  Discrimination between regression models to determine the pattern of enzyme synthesis in synchronous cell cultures. , 1970 .

[27]  Robert Tibshirani,et al.  Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .

[28]  H. Kishino,et al.  A New Molecular Clock of Mitochondrial DNA and the Evolution of Hominoids , 1984 .

[29]  M. Bulmer,et al.  Estimating the variability of substitution rates. , 1989, Genetics.

[30]  M. Kimura The Neutral Theory of Molecular Evolution: Introduction , 1983 .

[31]  J. Felsenstein,et al.  An evolutionary model for maximum likelihood alignment of DNA sequences , 1991, Journal of Molecular Evolution.

[32]  Nick Goldman,et al.  MAXIMUM LIKELIHOOD INFERENCE OF PHYLOGENETIC TREES, WITH SPECIAL REFERENCE TO A POISSON PROCESS MODEL OF DNA SUBSTITUTION AND TO PARSIMONY ANALYSES , 1990 .

[33]  Peilin Xu,et al.  Primate η-globin DNA sequences and man's place among the great apes , 1986, Nature.

[34]  M. Nei,et al.  Pseudogenes as a paradigm of neutral evolution , 1981, Nature.

[35]  M. Bartlett The Spectral Analysis of Point Processes , 1963 .

[36]  M. Hasegawa,et al.  Time of the deepest root for polymorphism in human mitochondrial DNA , 2005, Journal of Molecular Evolution.

[37]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[38]  James A. Lake,et al.  Origin of the eukaryotic nucleus determined by rate-invariant analysis of rRNA sequences , 1988, Nature.

[39]  Charles H. Langley,et al.  An examination of the constancy of the rate of molecular evolution , 2005, Journal of Molecular Evolution.

[40]  M. G. Kendall,et al.  The advanced theory of statistics. Vols. 2. , 1969 .

[41]  Joseph Felsenstein,et al.  The number of evolutionary trees , 1978 .

[42]  M. Clegg,et al.  Evolutionary Analysis of Plant DNA Sequences , 1987, The American Naturalist.

[43]  N. Maeda,et al.  Molecular evolution of intergenic DNA in higher primates: pattern of DNA changes, molecular clock, and evolution of repetitive sequences. , 1988, Molecular biology and evolution.

[44]  J. Felsenstein,et al.  Counting phylogenetic invariants in some simple cases. , 1991, Journal of theoretical biology.

[45]  H. Kishino,et al.  Man's place in Hominoidea as inferred from molecular clocks of DNA , 2005, Journal of Molecular Evolution.

[46]  M. Bishop,et al.  Evolutionary trees from nucleic acid and protein sequences , 1985, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[47]  I. Bross How to eradicate fraudulent statistical methods: statisticians must do science. , 1990, Biometrics.

[48]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[49]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[50]  H. Kishino,et al.  Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea , 1989, Journal of Molecular Evolution.

[51]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[52]  D. Penny,et al.  Controversy on chloroplast origins , 1992, FEBS Letters.

[53]  Masami Hasegawa,et al.  Phylogenetic relationships among eukaryotic kingdoms inferred from ribosomal RNA sequences , 2005, Journal of Molecular Evolution.

[54]  M. Goodman,et al.  Molecular evolution of the psi eta-globin gene locus: gibbon phylogeny and the hominoid slowdown. , 1991, Molecular biology and evolution.

[55]  J. K. Lindsey,et al.  Construction and Comparison of Statistical Models , 1974 .

[56]  H. Kishino,et al.  Estimation of branching dates among primates by molecular clocks of nuclear DNA which slowed down in Hominoidea , 1989 .

[57]  J. A. Cavender,et al.  Mechanized derivation of linear invariants. , 1989, Molecular biology and evolution.

[58]  J. Neefs,et al.  Compilation of small ribosomal subunit RNA sequences. , 1990, Nucleic acids research.

[59]  J. Oliver,et al.  The general stochastic model of nucleotide substitution. , 1990, Journal of theoretical biology.

[60]  D. Labie,et al.  Molecular Evolution , 1991, Nature.

[61]  H. Kishino,et al.  Mitochondrial DNA evolution in primates: Transition rate has been extremely low in the lemur , 1990, Journal of Molecular Evolution.

[62]  A. Hope A Simplified Monte Carlo Significance Test Procedure , 1968 .

[63]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[64]  M. Kendall,et al.  The advanced theory of statistics , 1945 .

[65]  J. Griffiths The Theory of Stochastic Processes , 1967 .

[66]  J. Felsenstein Phylogenies from molecular sequences: inference and reliability. , 1988, Annual review of genetics.

[67]  W. Loh,et al.  A New Method for Testing Separate Families of Hypotheses , 1985 .

[68]  M. Bulmer,et al.  A statistical analysis of nucleotide sequences of introns and exons in human genes. , 1987, Molecular biology and evolution.

[69]  J. Felsenstein,et al.  Inching toward reality: An improved likelihood model of sequence evolution , 2004, Journal of Molecular Evolution.

[70]  F. Marriott,et al.  Barnard's Monte Carlo Tests: How Many Simulations? , 1979 .

[71]  J. Gillespie,et al.  RATES OF MOLECULAR EVOLUTION , 1986 .

[72]  D. Cox Tests of Separate Families of Hypotheses , 1961 .

[73]  G A Churchill,et al.  Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants. , 1991, Molecular biology and evolution.

[74]  Anthony C. Atkinson,et al.  A Method for Discriminating between Models , 1970 .

[75]  D Penny,et al.  Estimating the reliability of evolutionary trees. , 1986, Molecular biology and evolution.

[76]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .