Close lower and upper bounds for the minimum reticulate network of multiple phylogenetic trees

Motivation: Reticulate network is a model for displaying and quantifying the effects of complex reticulate processes on the evolutionary history of species undergoing reticulate evolution. A central computational problem on reticulate networks is: given a set of phylogenetic trees (each for some region of the genomes), reconstruct the most parsimonious reticulate network (called the minimum reticulate network) that combines the topological information contained in the given trees. This problem is well-known to be NP-hard. Thus, existing approaches for this problem either work with only two input trees or make simplifying topological assumptions. Results: We present novel results on the minimum reticulate network problem. Unlike existing approaches, we address the fully general problem: there is no restriction on the number of trees that are input, and there is no restriction on the form of the allowed reticulate network. We present lower and upper bounds on the minimum number of reticulation events in the minimum reticulate network (and infer an approximately parsimonious reticulate network). A program called PIRN implements these methods, which also outputs a graphical representation of the inferred network. Empirical results on simulated and biological data show that our methods are practical for a wide range of data. More importantly, the lower and upper bounds match for many datasets (especially when the number of trees is small or reticulation level is low), and this allows us to solve the minimum reticulate network problem exactly for these datasets. Availability: A software tool, PIRN, is available for download from the web page: http://www.engr.uconn.edu/~ywu. Contact: ywu@engr.uconn.edu Supplementary information: Supplementary data is available at Bioinformatics online.

[1]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[2]  Heiko A. Schmidt,et al.  Phylogenetic trees from large datasets , 2003 .

[3]  Tandy J. Warnow,et al.  Reconstructing reticulate evolution in species: theory and practice , 2004, RECOMB.

[4]  Tandy J. Warnow,et al.  Reconstructing Reticulate Evolution in SpeciesTheory and Practice , 2005, J. Comput. Biol..

[5]  Charles Semple,et al.  A 3-approximation algorithm for the subtree distance between phylogenies , 2008, J. Discrete Algorithms.

[6]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[7]  Jerrold I. Davis,et al.  Phylogeny and subfamilial classification of the grasses (Poaceae) , 2001 .

[8]  Olivier Gascuel,et al.  Reconstructing evolution : new mathematical and computational advances , 2007 .

[9]  L. Nakhleh Evolutionary Phylogenetic Networks: Models and Issues , 2010 .

[10]  Yufeng Wu,et al.  A practical method for exact computation of subtree prune and regraft distance , 2009, Bioinform..

[11]  Dan Gusfield,et al.  Optimal, efficient reconstruction of root-unknown phylogenetic networks with constrained and structured recombination , 2005, J. Comput. Syst. Sci..

[12]  Mathematisch-Naturwissenschaftlichen Fakultat,et al.  Phylogenetic Trees from Large Datasets , 2003 .

[13]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[14]  L. Stougie,et al.  Constructing Level-2 Phylogenetic Networks from Triplets , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Charles Semple,et al.  Computing the minimum number of hybridization events for a consistent evolutionary history , 2007, Discret. Appl. Math..

[16]  Simone Linz,et al.  A Reduction Algorithm for Computing The Hybridization Number of Two Trees , 2007, Evolutionary bioinformatics online.

[17]  Tao Jiang,et al.  On the Complexity of Comparing Evolutionary Trees , 1996, Discret. Appl. Math..

[18]  Daniel H. Huson,et al.  Beyond Galled Trees - Decomposition and Computation of Galled Networks , 2007, RECOMB.

[19]  C. Semple,et al.  Hybridization in Nonbinary Trees , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Bernard M. E. Moret,et al.  Network ( Reticulate ) Evolution : Biology , Models , and Algorithms , 2004 .

[21]  Michael T. Hallett,et al.  Efficient algorithms for lateral gene transfer problems , 2001, RECOMB.

[22]  Daniel H. Huson,et al.  Computing galled networks from real data , 2009, Bioinform..

[23]  Charles Semple,et al.  A Framework for Representing Reticulate Evolution , 2005 .

[24]  Daniel H. Huson,et al.  Reconstruction of Reticulate Networks from Gene Trees , 2005, RECOMB.

[25]  Maria Luisa Bonet,et al.  Approximating Subtree Distances Between Phylogenies , 2006, J. Comput. Biol..

[26]  V. Moulton,et al.  Bounding the Number of Hybridisation Events for a Consistent Evolutionary History , 2005, Journal of mathematical biology.

[27]  Jiayin Wang,et al.  Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees , 2010, ISBRA.

[28]  Yoshiko Wakabayashi,et al.  Some Approximation Results for the Maximum Agreement Forest Problem , 2001, RANDOM-APPROX.