Seeing the trees and their branches in the network is hard

Phylogenetic networks are a restricted class of directed acyclic graphs that model evolutionary histories in the presence of reticulate evolutionary events, such as horizontal gene transfer, hybrid speciation, and recombination. Characterizing a phylogenetic network as a collection of trees and their branches has long been the basis for several methods of reconstructing and evaluating phylogenetic networks. Further, these characterizations have been used to understand molecular sequence evolution on phylogenetic networks. In this paper, we address theoretical questions with regard to phylogenetic networks, their characterizations, and sequence evolution on them. In particular, we prove that the problem of deciding whether a given tree is contained inside a network is NP-complete. Further, we prove that the problem of deciding whether a branch of a given tree is also a branch of a given network is polynomially equivalent to that of deciding whether the evolution of a molecular character (site) on a network is governed by the infinite site model. Exploiting this equivalence, we establish the NP-completeness of both problems, and provide a parameterized algorithm that runs in time O(2^k^/^2n^2), where n is the total number of nodes and k is the number of recombination nodes in the network, which significantly improves upon the trivial brute-force O(2^kn) time algorithm for the problem. This reduction in time is significant, particularly when analyzing recombination hotspots.

[1]  K. Crandall,et al.  Recombination in evolutionary genomics. , 2002, Annual review of genetics.

[2]  Daniel H. Huson,et al.  SplitsTree: analyzing and visualizing evolutionary data , 1998, Bioinform..

[3]  Jeffrey P. Mower,et al.  Plant genetics: Gene transfer from parasitic to host plants , 2004, Nature.

[4]  Loren H Rieseberg,et al.  Reconstructing patterns of reticulate evolution in plants. , 2004, American journal of botany.

[5]  H. Matsuda,et al.  Biased biological functions of horizontally transferred genes in prokaryotic genomes , 2004, Nature Genetics.

[6]  W. Maddison Gene Trees in Species Trees , 1997 .

[7]  Luay Nakhleh,et al.  Phylogenetic Networks: Properties and Relationship to Trees and Clusters , 2005, Trans. Comp. Sys. Biology.

[8]  Bernard M. E. Moret,et al.  Network ( Reticulate ) Evolution : Biology , Models , and Algorithms , 2004 .

[9]  T. Tuller,et al.  Inferring phylogenetic networks by the maximum parsimony criterion: a case study. , 2006, Molecular biology and evolution.

[10]  Dan Gusfield,et al.  Efficient reconstruction of phylogenetic networks with constrained recombination , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[11]  Michael T. Hallett,et al.  Efficient algorithms for lateral gene transfer problems , 2001, RECOMB.

[12]  Eric Bapteste,et al.  Deduction of probable events of lateral gene transfer through comparison of phylogenetic trees by recursive consolidation and rearrangement , 2005, BMC Evolutionary Biology.

[13]  Jeffrey D. Palmer,et al.  Widespread horizontal transfer of mitochondrial genes in flowering plants , 2003, Nature.

[14]  Dan Gusfield,et al.  Optimal, Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination , 2004, J. Bioinform. Comput. Biol..

[15]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[16]  Pawel Górecki,et al.  Reconciliation problems for duplication, loss and horizontal gene transfer , 2004, RECOMB.

[17]  W. Doolittle,et al.  How big is the iceberg of which organellar genes in nuclear genomes are but the tip? , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[18]  John M. Mellor-Crummey,et al.  Reconstructing phylogenetic networks using maximum parsimony , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[19]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[20]  J. Palmer,et al.  Massive horizontal transfer of mitochondrial genes from diverse land plant donors to the basal angiosperm Amborella. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Daniel H. Huson,et al.  SplitsTree-a program for analyzing and visualizing evolutionary data , 1997 .

[22]  Luay Nakhleh,et al.  RIATA-HGT: A Fast and Accurate Heuristic for Reconstructing Horizontal Gene Transfer , 2005, COCOON.

[23]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[24]  Vladimir Makarenkov,et al.  T-REX: reconstructing and visualizing phylogenetic trees and reticulation networks , 2001, Bioinform..

[25]  Nicholas Hamilton,et al.  Phylogenetic identification of lateral genetic transfer events , 2006, BMC Evolutionary Biology.

[26]  Bernard M. E. Moret,et al.  Network (Reticulated) Evolution: Biology, Models, and Algorithms , 2004 .

[27]  P. Marjoram,et al.  Ancestral Inference from Samples of DNA Sequences with Recombination , 1996, J. Comput. Biol..

[28]  Luay Nakhleh,et al.  Phylogenetic networks , 2004 .

[29]  Sagi Snir,et al.  Efficient parsimony-based methods for phylogenetic network reconstruction , 2007, Bioinform..

[30]  Kaizhong Zhang,et al.  Perfect Phylogenetic Networks with Recombination , 2001, J. Comput. Biol..

[31]  Luay Nakhleh,et al.  Phylogenetic Networks, Trees, and Clusters , 2005, International Conference on Computational Science.

[32]  Ge Xia,et al.  Reconstructing Evolution of Natural Languages: Complexity and Parameterized Algorithms , 2006, COCOON.

[33]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.

[34]  Sagi Snir,et al.  A New Linear-Time Heuristic Algorithm for Computing the Parsimony Score of Phylogenetic Networks: Theoretical Bounds and Empirical Performance , 2007, ISBRA.

[35]  Sagi Snir,et al.  Maximum likelihood of phylogenetic networks , 2006, Bioinform..

[36]  T. Warnow,et al.  Perfect Phylogenetic Networks: A New Methodology for Reconstructing the Evolutionary History of Natural Languages , 2005 .

[37]  Vincent Moulton,et al.  NeighborNet: An Agglomerative Method for the Construction of Planar Phylogenetic Networks , 2002, WABI.

[38]  K. Crandall,et al.  The Effect of Recombination on the Accuracy of Phylogeny Estimation , 2002, Journal of Molecular Evolution.

[39]  Kathryn Roeder,et al.  Association studies for quantitative traits in structured populations , 2002, Genetic epidemiology.

[40]  Tandy J. Warnow,et al.  Reconstructing reticulate evolution in species: theory and practice , 2004, RECOMB.