Maximum likelihood with multiparameter models of substitution

Maximum-likelihood approaches to phylogenetic estimation have the potential of great flexibility, even though current implementations are highly constrained. One such constraint has been the limitation to one-parameter models of substitution. A general implementation of Newton's maximization procedure was developed that allows the maximum likelihood method to be used with multiparameter models. The Estimate and Maximize (EM) algorithm was also used to obtain a good approximation to the maximum likelihood for a certain class of multiparameter models. The condition for which a multiparameter model will only have a single maximum on the likelihood surface was identified. Two-and three-parameter models of substitution in base-paired regions of RNA sequences were used as examples for computer simulations to show that these implementations of the maximum likelihood method are not substantially slower than one-parameter models. Newton's method is much faster than the EM method but may be subject to divergence in some circumstances. In these cases the EM method can be used to restore convergence.

[1]  C R Woese,et al.  The Ribosomal Database Project. , 1994, Nucleic acids research.

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Nick Goldman,et al.  Statistical tests of models of DNA substitution , 1993, Journal of Molecular Evolution.

[4]  F Rousset,et al.  Evolution of compensatory substitutions through G.U intermediate state in Drosophila rRNA. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[5]  E. Tillier,et al.  Neighbor Joining and Maximum Likelihood with RNA Sequences: Addressing the Interdependence of Sites , 1995 .

[6]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[7]  J. Douglas Faires,et al.  Numerical Analysis , 1981 .

[8]  J. Griffiths The Theory of Stochastic Processes , 1967 .

[9]  J. Felsenstein Phylogenies from molecular sequences: inference and reliability. , 1988, Annual review of genetics.

[10]  J. Felsenstein,et al.  An evolutionary model for maximum likelihood alignment of DNA sequences , 1991, Journal of Molecular Evolution.

[11]  Y. Tateno,et al.  Robustness of maximum likelihood tree estimation against different patterns of base substitutions , 2005, Journal of Molecular Evolution.

[12]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[13]  Phenetic and Phylogenetic Classification , 1965 .

[14]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[15]  Jeanne L. Agnew Linear algebra with applications , 1983 .

[16]  H. Kishino,et al.  Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea , 1989, Journal of Molecular Evolution.

[17]  J. Hartigan,et al.  Asynchronous distance between homologous DNA sequences. , 1987, Biometrics.

[18]  J. Felsenstein,et al.  Inching toward reality: An improved likelihood model of sequence evolution , 2004, Journal of Molecular Evolution.