Analytic Solutions for Three-Taxon MLMC Trees with Variable Rates Across Sites

We consider the problem of finding the maximum likelihood rooted tree under a molecular clock (MLMC), with three species and 2-state characters under a symmetric model of substitution. For identically distributed rates per site this is probably the simplest phylogenetic estimation problem, and it is readily solved numerically. Analytic solutions, on the other hand, were obtained only recently (Yang, 2000). In this work we provide analytic solutions for any distribution of rates across sites (provided the moment generating function of the distribution is strictly increasing over the negative real numbers). This class of distributions includes, among others, identical rates across sites, as well as the Gamma, the uniform, and the inverse Gaussian distributions. Therefore, our work generalizes Yang's solution. In addition, our derivation of the analytic solution is substantially simpler. We employ the Hadamard conjugation (Hendy and Penny, 1993) and convexity of an entropy-like function.

[1]  A. Dress,et al.  Reconstructing the shape of a tree from observed dissimilarity data , 1986 .

[2]  P. Erdös,et al.  A few logs suffice to build (almost) all trees (l): part I , 1997 .

[3]  B. Chor,et al.  Multiple maxima of likelihood in phylogenetic trees: an analytic approach , 2000, RECOMB '00.

[4]  Z. Yang,et al.  Complexity of the simplest phylogenetic estimation problem , 2000, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[5]  J. Neyman MOLECULAR STUDIES OF EVOLUTION: A SOURCE OF NOVEL STATISTICAL PROBLEMS* , 1971 .

[6]  Tandy J. Warnow,et al.  A Few Logs Suffice to Build (almost) All Trees: Part II , 1999, Theor. Comput. Sci..

[7]  F. MacWilliams,et al.  The Theory of Error-Correcting Codes , 1977 .

[8]  Dan Pelleg,et al.  Constructing Phylogenies from Quartets: Elucidation of Eutherian Superordinal Relationships , 1998, J. Comput. Biol..

[9]  D Penny,et al.  Hadamard conjugations and modeling sequence evolution with unequal rates across sites. , 1997, Molecular phylogenetics and evolution.

[10]  R. Gallager Information Theory and Reliable Communication , 1968 .

[11]  D. Penny,et al.  Spectral analysis of phylogenetic data , 1993 .

[12]  M. Kimura,et al.  The neutral theory of molecular evolution. , 1983, Scientific American.

[13]  D Penny,et al.  A discrete Fourier analysis for evolutionary trees. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[14]  S J Willson Measuring inconsistency in phylogenetic trees. , 1998, Journal of theoretical biology.

[15]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .