A simple method for estimating the parameter of substitution rate variation among sites.

When the rate variation among sites is described by a gamma distribution, an important problem is how to estimate the shape parameter alpha, which is an index of the degree of among-site rate variation. The parsimony-based methods for estimating alpha are simple but biased, i.e., alpha tends to be overestimated. On the other hand, the likelihood-based methods are asymptotically unbiased but take a huge amount of computational time. In this paper, we have developed a new method to solve this problem: we first estimate the expected number of substitutions at each site, which is corrected for multiple hits, and then estimate the parameter alpha. Our method is computationally as fast as the parsimony method, and the estimation accuracy is much higher than that of parsimony and similar to that of the likelihood method.

[1]  R. Nielsen,et al.  Site-by-site estimation of the rate of substitution and the correlation of rates in mitochondrial DNA. , 1997, Systematic biology.

[2]  J. Rice,et al.  Modeling nucleotide evolution: a heterogeneous rate analysis. , 1996, Mathematical biosciences.

[3]  J. Huelsenbeck,et al.  SUCCESS OF PHYLOGENETIC METHODS IN THE FOUR-TAXON CASE , 1993 .

[4]  Dolph Schluter,et al.  Uncertainty in ancient phylogenies , 1995, Nature.

[5]  W. Li,et al.  Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. , 1995, Molecular biology and evolution.

[6]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[7]  Thomas Uzzell,et al.  Fitting Discrete Probability Distributions to Evolutionary Events , 1971, Science.

[8]  M. Nei,et al.  A new method of inference of ancestral nucleotide and amino acid sequences. , 1995, Genetics.

[9]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[10]  M. Gouy,et al.  Evolutionary distances between nucleotide sequences based on the distribution of substitution rates among sites as estimated by parsimony. , 1997, Molecular biology and evolution.

[11]  L. Jin,et al.  Limitations of the evolutionary parsimony method of phylogenetic analysis. , 1990, Molecular biology and evolution.

[12]  W. Li,et al.  A general additive distance with time-reversibility and rate variation among nucleotide sites. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[14]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[15]  Z. Yang,et al.  Approximate methods for estimating the pattern of nucleotide substitution and the variation of substitution rates among sites. , 1996, Molecular biology and evolution.

[16]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[17]  M. Nei,et al.  Phylogenetic analysis in molecular evolutionary genetics. , 1996, Annual review of genetics.

[18]  M. Miyamoto,et al.  Constraints on protein evolution and the age of the eubacteria/eukaryote split. , 1996, Systematic biology.

[19]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .