Computational advances in maximum likelihood methods for molecular phylogeny.

We have developed a generalization of Kimura's Markov chain model for base substitution at a single nucleotide site. This generalized model incorporates more flexible transition rates and consequently allows irreversible as well as reversible chains. Because the model embodies just the right amount of symmetry, it permits explicit calculation of finite-time transition probabilities and equilibrium distributions. The model also meshes well with maximum likelihood methods for phylogenetic analysis. Quick calculation of likelihoods and their derivatives can be carried out by adapting Baum's forward and backward algorithms from the theory of hidden Markov chains. Analysis of HIV sequence data illustrates the speed of the algorithms on trees with many contemporary taxa. Analysis of some of Lake's data on the origin of the eukaryotic nucleus contrasts the reversible and irreversible versions of the model.

[1]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[2]  C. R. Rao,et al.  Linear Statistical Inference and its Applications , 1968 .

[3]  Michael S. Waterman,et al.  Introduction to Computational Biology: Maps, Sequences and Genomes , 1998 .

[4]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[5]  Frank Kelly,et al.  Reversibility and Stochastic Networks , 1979 .

[6]  James A. Lake,et al.  Origin of the eukaryotic nucleus determined by rate-invariant analysis of rRNA sequences , 1988, Nature.

[7]  G A Churchill,et al.  Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants. , 1991, Molecular biology and evolution.

[8]  J. Hartigan,et al.  Statistical Analysis of Hominoid Molecular Evolution , 1987 .

[9]  Hideo Matsuda,et al.  fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood , 1994, Comput. Appl. Biosci..

[10]  J A Lake,et al.  A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. , 1987, Molecular biology and evolution.

[11]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[12]  Gilcher Ro Human retroviruses and AIDS. , 1988 .

[13]  A. Rzhetsky Estimating substitution rates in ribosomal RNA genes. , 1995, Genetics.

[14]  K Lange,et al.  Statistical methods for polyploid radiation hybrid mapping. , 1995, Genome research.

[15]  Pierre A. Devijver,et al.  Baum's forward-backward algorithm revisited , 1985, Pattern Recognit. Lett..

[16]  R. F. Smith,et al.  Human retroviruses and aids, 1992 , 1992 .

[17]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[18]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[19]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[20]  Elijah Polak,et al.  Optimization: Algorithms and Consistent Approximations , 1997 .

[21]  M. Lundy Applications of the annealing algorithm to combinatorial problems in statistics , 1985 .

[22]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .