MURPAR: A Fast Heuristic for Inferring Parsimonious Phylogenetic Networks from Multiple Gene Trees

Phylogenetic networks provide a graphical representation of evolutionary histories that involve non-treelike evolutionary events, such as horizontal gene transfer (HGT). One approach for inferring phylogenetic networks is based on reconciling gene trees, assuming all incongruence among the gene trees is due to HGT. Several mathematical results and algorithms, both exact and heuristic, have been introduced to construct and analyze phylogenetic networks. Here, we address the computational problem of inferring phylogenetic networks with minimum reticulations from a collection of gene trees. As this problem is known to be NP-hard even for a pair of gene trees, the problem at hand is very hard. In this paper, we present an efficient heuristic, MURPAR, for inferring a phylogenetic network from a collection of gene trees by using pairwise reconciliations of trees in the collection. Given the development of efficient and accurate methods for pairwise gene tree reconciliations, MURPAR inherits this efficiency and accuracy. Further, the method includes a formulation for combining pairwise reconciliations that is naturally amenable to an efficient integer linear programming (ILP) solution. We show that MURPAR produces more accurate results than other methods and is at least as fast, when run on synthetic and biological data. We believe that our method is especially important for rapidly obtaining estimates of genome-scale evolutionary histories that can be further refined by more detailed and compute-intensive methods.

[1]  Heiko A. Schmidt,et al.  Phylogenetic trees from large datasets , 2003 .

[2]  Eric Bapteste,et al.  Deduction of probable events of lateral gene transfer through comparison of phylogenetic trees by recursive consolidation and rearrangement , 2005, BMC Evolutionary Biology.

[3]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[4]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[5]  Norbert Zeh,et al.  Fast FPT Algorithms for Computing Rooted Agreement Forests: Theory and Experiments , 2010, SEA.

[6]  N. Galtier A model of horizontal gene transfer and the bacterial phylogeny problem. , 2007, Systematic biology.

[7]  Charles Semple,et al.  A Framework for Representing Reticulate Evolution , 2005 .

[8]  L. Nakhleh,et al.  ALGORITHMIC STRATEGIES FOR ESTIMATING THE AMOUNT OF RETICULATION FROM A COLLECTION OF GENE TREES , 2010 .

[9]  Leo van Iersel,et al.  Phylogenetic networks do not need to be complex: using fewer reticulations to represent conflicting clusters , 2009, Bioinform..

[10]  Mark A Ragan,et al.  Untangling hybrid phylogenetic signals: horizontal gene transfer and artifacts of phylogenetic reconstruction. , 2009, Methods in molecular biology.

[11]  Simone Linz,et al.  A Cluster Reduction for Computing the Subtree Distance Between Phylogenies , 2011 .

[12]  Simone Linz,et al.  A Reduction Algorithm for Computing The Hybridization Number of Two Trees , 2007, Evolutionary bioinformatics online.

[13]  Luay Nakhleh,et al.  Confounding Factors in HGT Detection: Statistical Error, Coalescent Effects, and Multiple Solutions , 2007, J. Comput. Biol..

[14]  Michael T. Hallett,et al.  Towards Identifying Lateral Gene Transfer Events , 2002, Pacific Symposium on Biocomputing.

[15]  Luay Nakhleh,et al.  RIATA-HGT: A Fast and Accurate Heuristic for Reconstructing Horizontal Gene Transfer , 2005, COCOON.

[16]  Pablo A. Goloboff,et al.  Calculating SPR distances between trees , 2008, Cladistics : the international journal of the Willi Hennig Society.

[17]  Michael T. Hallett,et al.  Simultaneous Identification of Duplications and Lateral Gene Transfers , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Michael T. Hallett,et al.  Efficient algorithms for lateral gene transfer problems , 2001, RECOMB.

[19]  Jiayin Wang,et al.  Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees , 2010, ISBRA.

[20]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[21]  Nicholas Hamilton,et al.  Phylogenetic identification of lateral genetic transfer events , 2006, BMC Evolutionary Biology.

[22]  Yufeng Wu,et al.  Close lower and upper bounds for the minimum reticulate network of multiple phylogenetic trees , 2010, Bioinform..

[23]  Robert Fredriksson,et al.  SPRIT: Identifying horizontal gene transfer in rooted phylogenetic trees , 2010, BMC Evolutionary Biology.

[24]  Oliver Eulenstein,et al.  Bioinformatics Research and Applications , 2008 .

[25]  Luay Nakhleh,et al.  SPR-based Tree Reconciliation: Non-binary Trees and Multiple Solutions , 2008, APBC.

[26]  L. Nakhleh Evolutionary Phylogenetic Networks: Models and Issues , 2010 .

[27]  Luay Nakhleh,et al.  PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships , 2008, BMC Bioinformatics.

[28]  Daniel H. Huson,et al.  Summarizing Multiple Gene Trees Using Cluster Networks , 2008, WABI.