Upper bounds on maximum likelihood for phylogenetic trees

We introduce a mechanism for analytically deriving upper bounds on the maximum likelihood for genetic sequence data on sets of phylogenies. A simple 'partition' bound is introduced for general models. Tighter bounds are developed for the simplest model of evolution, the two state symmetric model of nucleotide substitution under the molecular clock. This follows earlier theoretical work which has been restricted to this model by analytic complexity. A weakness of current numerical computation is that reported 'maximum likelihood' results cannot be guaranteed, both for a specified tree (because of the possibility of multiple maxima) or over the full tree space (as the computation is intractable for large sets of trees). The bounds we develop here can be used to conclusively eliminate large proportions of tree space in the search for the maximum likelihood tree. This is vital in the development of a branch and bound search strategy for identifying the maximum likelihood tree. We report the results from a simulation study of approximately 10(6) data sets generated on clock-like trees of five leaves. In each trial a likelihood value of one specific instance of a parameterised tree is compared to the bound determined for each of the 105 possible rooted binary trees. The proportion of trees that are eliminated from the search for the maximum likelihood tree ranged from 92% to almost 98%, indicating a computational speed-up factor of between 12 and 44.

[1]  J. Neyman MOLECULAR STUDIES OF EVOLUTION: A SOURCE OF NOVEL STATISTICAL PROBLEMS* , 1971 .

[2]  Sagi Snir,et al.  Maximum likelihood on four taxa phylogenetic trees: analytic solutions , 2003, RECOMB '03.

[3]  D. Penny,et al.  Branch and bound algorithms to determine minimal evolutionary trees , 1982 .

[4]  Mike Steel,et al.  The Maximum Likelihood Point for a Phylogenetic Tree is Not Unique , 1994 .

[5]  M. Hendy The Relationship Between Simple Evolutionary Tree Models and Observable Sequence Data , 1989 .

[6]  M. Kimura,et al.  The neutral theory of molecular evolution. , 1983, Scientific American.

[7]  D. Penny,et al.  Outgroup misplacement and phylogenetic inaccuracy under a molecular clock--a simulation study. , 2003, Systematic biology.

[8]  Mike A. Steel,et al.  Reconstructing Phylogenies From Nucleotide Pattern Probabilities: A Survey and some New Results , 1998, Discret. Appl. Math..

[9]  Z. Yang,et al.  Complexity of the simplest phylogenetic estimation problem , 2000, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[10]  D Penny,et al.  A discrete Fourier analysis for evolutionary trees. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Michael D. Hendy,et al.  A Framework for the Quantitative Study of Evolutionary Trees , 1989 .

[12]  Michael D. Hendy,et al.  Analytic Solutions for Three-Taxon MLMC Trees with Variable Rates Across Sites , 2001, WABI.

[13]  Michael D. Hendy,et al.  TurboTree: a fast algorithm for minimal trees , 1987, Comput. Appl. Biosci..

[14]  D. Penny,et al.  Spectral analysis of phylogenetic data , 1993 .

[15]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[16]  B. Chor,et al.  Multiple maxima of likelihood in phylogenetic trees: an analytic approach , 2000, RECOMB '00.