Maximum Likelihood of Evolutionary Trees Is Hard

Maximum likelihood (ML) is an increasingly popular optimality criterion for selecting evolutionary trees (Felsenstein, 1981). Finding optimal ML trees appears to be a very hard computational task, but for tractable cases, ML is the method of choice. In particular, algorithms and heuristics for ML take longer to run than algorithms and heuristics for the second major character based criterion, maximum parsimony (MP). However, while MP has been known to be NP-complete for over 20 years (Day, Johnson and Sankoff [5], reduction from vertex cover), such a hardness result for ML has so far eluded researchers in the field. An important work by Tuffley and Steel (1997) proves quantitative relations between parsimony values and the corresponding log likelihood values. However, a direct application of it would only give an exponential time reduction from MP to ML. Another step in this direction has recently been made by Addario-Berry et al. (2004), who proved that ancestral maximum likelihood (AML) is NP-complete. AML “lies in between” the two problems, having some properties of MP and some properties of ML. We resolve the question, showing that “regular” ML on phylogenetic trees is indeed intractable. Our reduction follows those for MP and AML, but starts from an approximation version of vertex cover, known as gap vc. The crux of our work is not the reduction, but its correctness proof. The proof goes through a series of tree modifications, while controlling the likelihood losses at each step, using the bounds of Tuffley and Steel. The proof can be viewed as correlating the value of any ML solution to an arbitrarily close approximation to vertex cover.

[1]  Johan Håstad,et al.  Some optimal inapproximability results , 2001, JACM.

[2]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[3]  B. Chor,et al.  Multiple maxima of likelihood in phylogenetic trees: an analytic approach , 2000, RECOMB '00.

[4]  David S. Johnson,et al.  The computational complexity of inferring rooted phylogenies by parsimony , 1986 .

[5]  J. Neyman MOLECULAR STUDIES OF EVOLUTION: A SOURCE OF NOVEL STATISTICAL PROBLEMS* , 1971 .

[6]  P. Berman,et al.  On Some Tighter Inapproximability Results , 1998, Electron. Colloquium Comput. Complex..

[7]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[8]  J. Felsenstein Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. , 1996, Methods in enzymology.

[9]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[10]  Marek Karpinski,et al.  On Some Tighter Inapproximability Results (Extended Abstract) , 1999, ICALP.

[11]  M. Nei,et al.  A new method of inference of ancestral nucleotide and amino acid sequences. , 1995, Genetics.

[12]  Robin Milner,et al.  On Observing Nondeterminism and Concurrency , 1980, ICALP.

[13]  D Penny,et al.  Parsimony, likelihood, and the role of models in molecular phylogenetics. , 2000, Molecular biology and evolution.

[14]  M. Steel The complexity of reconstructing trees from qualitative characters and subtrees , 1992 .

[15]  Richard A. Goldstein,et al.  Probabilistic reconstruction of ancestral protein sequences , 1996, Journal of Molecular Evolution.

[16]  Mike Steel,et al.  The Maximum Likelihood Point for a Phylogenetic Tree is Not Unique , 1994 .

[17]  M Steel,et al.  Links between maximum likelihood and maximum parsimony under a simple model of site substitution. , 1997, Bulletin of mathematical biology.

[18]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[19]  Marek Karpinski,et al.  Approximating Bounded Degree Instances of NP-Hard Problems , 2001, FCT.

[20]  W. H. Day Computational complexity of inferring phylogenies from dissimilarity matrices. , 1987, Bulletin of mathematical biology.

[21]  S. Gupta,et al.  Statistical decision theory and related topics IV , 1988 .

[22]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[23]  David Sankoff,et al.  COMPUTATIONAL COMPLEXITY OF INFERRING PHYLOGENIES BY COMPATIBILITY , 1986 .

[24]  Alessandro Panconesi,et al.  Ancestral Maximum Likelihood of Evolutionary Trees Is Hard , 2004, J. Bioinform. Comput. Biol..