Maximum likelihood of evolutionary trees: hardness and approximation

MOTIVATION Maximum likelihood (ML) is an increasingly popular optimality criterion for selecting evolutionary trees. Yet the computational complexity of ML was open for over 20 years, and only recently resolved by the authors for the Jukes-Cantor model of substitution and its generalizations. It was proved that reconstructing the ML tree is computationally intractable (NP-hard). In this work we explore three directions, which extend that result. RESULTS (1) We show that ML under the assumption of molecular clock is still computationally intractable (NP-hard). (2) We show that not only is it computationally intractable to find the exact ML tree, even approximating the logarithm of the ML for any multiplicative factor smaller than 1.00175 is computationally intractable. (3) We develop an algorithm for approximating log-likelihood under the condition that the input sequences are sparse. It employs any approximation algorithm for parsimony, and asymptotically achieves the same approximation ratio. We note that ML reconstruction for sparse inputs is still hard under this condition, and furthermore many real datasets satisfy it.

[1]  P. Berman,et al.  On Some Tighter Inapproximability Results , 1998, Electron. Colloquium Comput. Complex..

[2]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[3]  Mike Steel,et al.  Links between maximum likelihood and maximum parsimony under a simple model of site substitution , 1997 .

[4]  A. Eyre-Walker Fundamentals of Molecular Evolution (2nd edn) , 2000, Heredity.

[5]  M. Nei,et al.  Molecular Evolution and Phylogenetics , 2000 .

[6]  W. H. Day Computational complexity of inferring phylogenies from dissimilarity matrices. , 1987, Bulletin of mathematical biology.

[7]  R Mollicone,et al.  Evolution of alpha 2-fucosyltransferase genes in primates: relation between an intronic Alu-Y element and red cell expression of ABH antigens. , 2000, Molecular biology and evolution.

[8]  Dan Graur,et al.  Fundamentals of Molecular Evolution, 2nd Edition , 2000 .

[9]  Giorgio Gambosi,et al.  Complexity and Approximation , 1999, Springer Berlin Heidelberg.

[10]  Z. Yang,et al.  Estimation of primate speciation dates using local molecular clocks. , 2000, Molecular biology and evolution.

[11]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[12]  Ming-Yang Kao,et al.  Provably Fast and Accurate Recovery of Evolutionary Trees through Harmonic Greedy Triplets , 2000, SIAM J. Comput..

[13]  B. Chor,et al.  Multiple maxima of likelihood in phylogenetic trees: an analytic approach , 2000, RECOMB '00.

[14]  Sudhir Kumar,et al.  MEGA: Molecular Evolutionary Genetics Analysis software for microcomputers , 1994, Comput. Appl. Biosci..

[15]  David S. Johnson,et al.  The computational complexity of inferring rooted phylogenies by parsimony , 1986 .

[16]  D. Penny,et al.  Spectral analysis of phylogenetic data , 1993 .

[17]  Joseph T. Chang,et al.  Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[18]  M. Miles,et al.  An insect molecular clock dates the origin of the insects and accords with palaeontological and biogeographic landmarks. , 2002, Molecular biology and evolution.

[19]  László A. Székely,et al.  Inverting Random Functions II: Explicit Bounds for Discrete Maximum Likelihood Estimation, with Applications , 2002, SIAM J. Discret. Math..

[20]  D Penny,et al.  A discrete Fourier analysis for evolutionary trees. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[21]  D Penny,et al.  Parsimony, likelihood, and the role of models in molecular phylogenetics. , 2000, Molecular biology and evolution.

[22]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[23]  Alessandro Panconesi,et al.  Ancestral Maximum Likelihood of Evolutionary Trees Is Hard , 2003, WABI.

[24]  Marek Karpinski,et al.  Approximating Bounded Degree Instances of NP-Hard Problems , 2001, FCT.

[25]  M M Miyamoto,et al.  Molecular systematics of higher primates: genealogical relations and classification. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Johan Håstad,et al.  Some optimal inapproximability results , 2001, JACM.

[27]  M. Steel The complexity of reconstructing trees from qualitative characters and subtrees , 1992 .

[28]  Tamir Tuller,et al.  Maximum Likelihood of Evolutionary Trees Is Hard , 2005, RECOMB.

[29]  Mike Steel,et al.  The Maximum Likelihood Point for a Phylogenetic Tree is Not Unique , 1994 .

[30]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[31]  L. Pauling,et al.  Evolutionary Divergence and Convergence in Proteins , 1965 .

[32]  J. Neyman MOLECULAR STUDIES OF EVOLUTION: A SOURCE OF NOVEL STATISTICAL PROBLEMS* , 1971 .

[33]  Barbara R. Holland,et al.  Upper bounds on maximum likelihood for phylogenetic trees , 2003, ECCB.

[34]  J. Felsenstein Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. , 1996, Methods in enzymology.

[35]  Morris Goodman,et al.  Globin evolution was apparently very rapid in early vertebrates: A reasonable case against the rate-constancy hypothesis , 2005, Journal of Molecular Evolution.

[36]  P. Erdös,et al.  A few logs suffice to build (almost) all trees (l): part I , 1997 .

[37]  D. Mindell Fundamentals of molecular evolution , 1991 .

[38]  Tandy J. Warnow,et al.  A few logs suffice to build (almost) all trees (I) , 1999, Random Struct. Algorithms.

[39]  E. Margoliash PRIMARY STRUCTURE AND EVOLUTION OF CYTOCHROME C. , 1963, Proceedings of the National Academy of Sciences of the United States of America.

[40]  H. Wareham On the computational complexity of inferring evolutionary trees , 1992 .

[41]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[42]  David Sankoff,et al.  COMPUTATIONAL COMPLEXITY OF INFERRING PHYLOGENIES BY COMPATIBILITY , 1986 .