Computing the minimum number of hybridization events for a consistent evolutionary history

It is now well-documented that the structure of evolutionary relationships between a set of present-day species is not necessarily tree-like. The reason for this is that reticulation events such as hybridizations mean that species are a mixture of genes from different ancestors. Since such events are relatively rare, a fundamental problem for biologists is to determine the smallest number of hybridization events required to explain a given (input) set of data in a single (hybrid) phylogeny. The main results of this paper show that computing this smallest number is APX-hard, and thus NP-hard, in the case the input is a collection of phylogenetic trees on sets of present-day species. This answers a problem which was raised at a recent conference (Phylogenetic Combinatorics and Applications, Uppsala University, 2004). As a consequence of these results, we also correct a previously published NP-hardness proof in the case the input is a collection of binary sequences, where each sequence represents the attributes of a particular present-day species. The APX-hardness of these problems means that it is unlikely that there is an efficient algorithm for either computing the result exactly or approximating it to any arbitrary degree of accuracy.

[1]  Viggo Kann,et al.  Maximum Bounded 3-Dimensional Matching is MAX SNP-Complete , 1991, Inf. Process. Lett..

[2]  Tandy J. Warnow,et al.  Reconstructing reticulate evolution in species: theory and practice , 2004, RECOMB.

[3]  Maria Luisa Bonet,et al.  Approximating Subtree Distances Between Phylogenies , 2006, J. Comput. Biol..

[4]  Dan Gusfield,et al.  Optimal, Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination , 2004, J. Bioinform. Comput. Biol..

[5]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[6]  Max Planck,et al.  Inapproximability Results for Bounded Variants of Optimization Problems , 2003 .

[7]  Kaizhong Zhang,et al.  Perfect phylogenetic networks with recombination , 2001, J. Comput. Biol..

[8]  Tao Jiang,et al.  On the Complexity of Comparing Evolutionary Trees , 1996, Discret. Appl. Math..

[9]  R. Steele Optimization , 2005 .

[10]  Yoshiko Wakabayashi,et al.  Some Approximation Results for the Maximum Agreement Forest Problem , 2001, RANDOM-APPROX.

[11]  Miroslav Chlebík,et al.  Inapproximability Results for Bounded Variants of Optimization Problems , 2003, FCT.

[12]  Charles Semple,et al.  A Framework for Representing Reticulate Evolution , 2005 .

[13]  Giorgio Gambosi,et al.  Complexity and Approximation , 1999, Springer Berlin Heidelberg.

[14]  Peter L. Hammer,et al.  Discrete Applied Mathematics , 1993 .

[15]  Yun S. Song,et al.  Constructing Minimal Ancestral Recombination Graphs , 2005, J. Comput. Biol..

[16]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[17]  Mihalis Yannakakis,et al.  Optimization, approximation, and complexity classes , 1991, STOC '88.

[18]  Oded Schwartz,et al.  On the Hardness of Approximating k-Dimensional Matching , 2003, Electron. Colloquium Comput. Complex..

[19]  Tandy J. Warnow,et al.  Reconstructing Reticulate Evolution in SpeciesTheory and Practice , 2005, J. Comput. Biol..

[20]  J. Hein Reconstructing evolution of sequences subject to recombination using parsimony. , 1990, Mathematical biosciences.

[21]  V. Moulton,et al.  Bounding the Number of Hybridisation Events for a Consistent Evolutionary History , 2005, Journal of mathematical biology.