Statistical tests of gamma-distributed rate heterogeneity in models of sequence evolution in phylogenetics.

Likelihood ratio tests (LRTs) for comparing models of sequence evolution have become popular over the last few years (Goldman 1993; Yang, Goldman and Friday 1994, 1995; Huelsenbeck and Crandall 1997; Huelsenbeck and Rannala 1997). In their simplest form, such tests compare a simpler null hypothesis (H 0) with a more complex alternative hypotheses (H 1) which is a generalization of H0. H0 can be derived from H 1 by fixing one or more of its free parameters at particular values, and the hypotheses are described as nested. Although it is also possible to test non-nested models (Goldman 1993), nested models are often preferred, as statistical tests are simpler to perform and their results can be easier to interpret. The test statistic for an LRT can be written as 2 2 where and ˆ ˆ ˆ ˆ ˆ ln(L /L ) 2(ln(L ) ln(L )), L H H H H H 1 0 1 0 0 are the maximum-likelihood (ML) scores under hyL̂H1 potheses H0 and H1, respectively. This statistic measures how much improvement H 1 gives over H0, and when the hypotheses are nested, 2 will always be nonnegative. For these nested hypotheses, and under certain regularity conditions, the asymptotic distribution of 2 (i.e., for large amounts of data) will be . Here, k is the 2 k number of degrees of freedom by which H 0 and H1 differ, that is, the number of free parameters of H 1 whose values must be fixed to derive H 0 (Wald 1949; Silvey 1975; Felsenstein 1981; Goldman 1993; Yang, Goldman, and Friday 1994, 1995). (In effect, each free parameter contributes a variate to the distribution of 2 , 2 1 with the sum ofk independent variates being distrib2 1 uted as ) Statistical tests assessed using such 2 dis2 . k tributions have now become a widespread and useful tool in phylogenetics (Huelsenbeck and Crandall 1997; Huelsenbeck and Rannala 1997). Recently, there has been renewed interest in testing whether the predicted 2 distribution gives a reliable estimate of the true distribution of 2 under realistic conditions (e.g., with finite sequence lengths). Whelan and Goldman (1999) investigated cases in which the competing hypotheses were different models of nucleotide substitution. Under three specimen experimental designs (representing realistic phylogenies and nucleotide substitution processes), we found that the 2 distribution was acceptable for performing tests of the significance of parameters describing the relative rate of transition

[1]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[2]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[3]  N. Goldman,et al.  Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation. , 1994, Molecular biology and evolution.

[4]  B. Rannala,et al.  Phylogenetic methods come of age: testing hypotheses in an evolutionary context. , 1997, Science.

[5]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[6]  A. Wald Note on the Consistency of the Maximum Likelihood Estimate , 1949 .

[7]  Nick Goldman,et al.  MAXIMUM LIKELIHOOD TREES FROM DNA SEQUENCES: A PECULIAR STATISTICAL ESTIMATION PROBLEM , 1995 .

[8]  S. Jeffery Evolution of Protein Molecules , 1979 .

[9]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[10]  H Kishino,et al.  Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters. , 2000, Molecular biology and evolution.

[11]  John P. Huelsenbeck,et al.  Variation in the Pattern of Nucleotide Substitution Across Sites , 1999, Journal of Molecular Evolution.

[12]  D. Andrews Inconsistency of the Bootstrap when a Parameter is on the Boundary of the Parameter Space , 2000 .

[13]  K. Liang,et al.  Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions , 1987 .

[14]  Simon Whelan,et al.  Distributions of statistics used for the comparison of models of sequence evolution in phylogenetics , 1999 .