A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees

BackgroundReticulate events play an important role in determining evolutionary relationships. The problem of computing the minimum number of such events to explain discordance between two phylogenetic trees is a hard computational problem. Even for binary trees, exact solvers struggle to solve instances with reticulation number larger than 40-50.ResultsHere we present CycleKiller and NonbinaryCycleKiller, the first methods to produce solutions verifiably close to optimality for instances with hundreds or even thousands of reticulations.ConclusionsUsing simulations, we demonstrate that these algorithms run quickly for large and difficult instances, producing solutions that are very close to optimality. As a spin-off from our simulations we also present TerminusEst, which is the fastest exact method currently available that can handle nonbinary trees: this is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All three methods are based on extensions of previous theoretical work (SIDMA 26(4):1635-1656, TCBB 10(1):18-25, SIDMA 28(1):49-66) and are publicly available. We also apply our methods to real data.

[1]  C. Semple,et al.  Hybridization in Nonbinary Trees , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Christian M. Zmasek,et al.  GreenPhylDB v2.0: comparative and functional genomics in plants , 2010, Nucleic Acids Res..

[3]  Yoshiko Wakabayashi,et al.  The maximum agreement forest problem: Approximation algorithms and computational experiments , 2007, Theor. Comput. Sci..

[4]  Leo van Iersel,et al.  Cycle Killer...Qu'est-ce que c'est? On the Comparative Approximability of Hybridization Number and Directed Feedback Vertex Set , 2011, SIAM J. Discret. Math..

[5]  Norbert Zeh,et al.  Fixed-Parameter Algorithms for Maximum Agreement Forests , 2011, SIAM J. Comput..

[6]  Norbert Zeh,et al.  Fast FPT Algorithms for Computing Rooted Agreement Forests: Theory and Experiments , 2010, SEA.

[7]  Daniel H. Huson,et al.  Fast computation of minimum hybridization networks , 2012, Bioinform..

[8]  Simone Linz,et al.  Hybridization in Non-Binary Trees , 2008 .

[9]  Olivier Gascuel,et al.  Mathematics of Evolution and Phylogeny , 2005 .

[10]  Simone Linz,et al.  Quantifying Hybridization in Realistic Time , 2011, J. Comput. Biol..

[11]  Steven Kelk,et al.  Phylogenetic Networks: Concepts, Algorithms and Applications , 2012 .

[12]  Charles Semple,et al.  Computing the minimum number of hybridization events for a consistent evolutionary history , 2007, Discret. Appl. Math..

[13]  Simone Linz,et al.  A Reduction Algorithm for Computing The Hybridization Number of Two Trees , 2007, Evolutionary bioinformatics online.

[14]  Jörg Flum,et al.  Parameterized Complexity Theory , 2006, Texts in Theoretical Computer Science. An EATCS Series.

[15]  Norbert Zeh,et al.  Fixed-Parameter and Approximation Algorithms for Maximum Agreement Forests of Multifurcating Trees , 2013, Algorithmica.

[16]  D. Huson,et al.  A Survey of Combinatorial Methods for Phylogenetic Networks , 2010, Genome biology and evolution.

[17]  Steven Kelk,et al.  Networks: expanding evolutionary thinking. , 2013, Trends in genetics : TIG.

[18]  BMC Bioinformatics , 2005 .

[19]  J. Davenport Editor , 1960 .

[20]  Joseph Naor,et al.  Approximating Minimum Feedback Sets and Multicuts in Directed Graphs , 1998, Algorithmica.

[21]  Zhi-Zhong Chen,et al.  An Ultrafast Tool for Minimum Reticulate Networks , 2013, J. Comput. Biol..

[22]  D. Huson,et al.  Dendroscope 3: an interactive tool for rooted phylogenetic trees and networks. , 2012, Systematic biology.

[23]  V. Moulton,et al.  Bounding the Number of Hybridisation Events for a Consistent Evolutionary History , 2005, Journal of mathematical biology.

[24]  Zhi-Zhong Chen,et al.  Algorithms for Reticulate Networks of Multiple Phylogenetic Trees , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  Olivier Gascuel,et al.  Reconstructing evolution : new mathematical and computational advances , 2007 .

[26]  Vincent Berry,et al.  Building species trees from larger parts of phylogenomic databases , 2011, Inf. Comput..

[27]  Zhi-Zhong Chen,et al.  HybridNET: a tool for constructing hybridization networks , 2010, Bioinform..

[28]  Leo van Iersel,et al.  Approximation Algorithms for Nonbinary Agreement Forests , 2012, SIAM J. Discret. Math..

[29]  L. Nakhleh Evolutionary Phylogenetic Networks: Models and Issues , 2010 .

[30]  Steven Kelk,et al.  A Simple Fixed Parameter Tractable Algorithm for Computing the Hybridization Number of Two (Not Necessarily Binary) Trees , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  Leo van Iersel,et al.  A Practical Approximation Algorithm for Solving Massive Instances of Hybridization Number , 2012, WABI.

[32]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .