Efficient reconstruction of phylogenetic networks with constrained recombination

A phylogenetic network is a generalization of a phylogenetic tree, allowing structural properties that are not treelike. With the growth of genomic data, much of which does not fit ideal tree models, there is greater need to understand the algorithmics and combinatorics of phylogenetic networks. We consider the problem of determining whether the sequences can be derived on a phylogenetic network where the recombination cycles are node disjoint. In this paper, we call such a phylogenetic network a "galled-tree". By more deeply analysing the combinatorial constraints on cycle-disjoint phylogenetic networks, we obtain an efficient algorithm that is guaranteed to be both a necessary and sufficient test for the existence of a galled-tree for the data. If there is a galled-tree, the algorithm constructs one and obtains an implicit representation of all the galled trees for the data, and can create these in linear time for each one. We also note two additional results related to galled trees: first, any set of sequences that can be derived on a galled tree can be derived on a true tree (without recombination cycles), where at most one back mutation is allowed per site; second, the site compatibility problem (which is NP-hard in general) can be solved in linear time for any set of sequences that can be derived on a galled tree. The combinatorial constraints we develop apply (for the most part) to node-disjoint cycles in any phylogenetic network (not just galled-trees), and can be used for example to prove that a given site cannot be on a node-disjoint cycle in any phylogenetic network. Perhaps more important than the specific results about galled-trees, we introduce an approach that can be used to study recombination in phylogenetic networks that go beyond galled-trees.

[1]  T. Warnow,et al.  Perfect Phylogenetic Networks: A New Methodology for Reconstructing the Evolutionary History of Natural Languages , 2005 .

[2]  John D. Kececioglu,et al.  Reconstructing a history of recombinations from a set of sequences , 1994, SODA '94.

[3]  J. Hein Reconstructing evolution of sequences subject to recombination using parsimony. , 1990, Mathematical biosciences.

[4]  Simon Tavaré,et al.  Linkage disequilibrium: what history has to tell us. , 2002, Trends in genetics : TIG.

[5]  David Sankoff,et al.  COMPUTATIONAL COMPLEXITY OF INFERRING PHYLOGENIES BY COMPATIBILITY , 1986 .

[6]  M T Clegg,et al.  Heterogeneous geographic patterns of nucleotide sequence diversity between two alcohol dehydrogenase genes in wild barley (Hordeum vulgare subspecies spontaneum). , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Kaizhong Zhang,et al.  Perfect Phylogenetic Networks with Recombination , 2001, J. Comput. Biol..

[8]  Feodor F. Dragan,et al.  Strongly Orderable Graphs a Common Generalization of Strongly Chordal and Chordal Bipartite Graphs , 2000, Discret. Appl. Math..

[9]  Yun S. Song,et al.  On the minimum number of recombination events in the evolutionary history of DNA sequences , 2004, Journal of mathematical biology.

[10]  R. Hudson,et al.  Statistical properties of the number of recombination events in the history of a sample of DNA sequences. , 1985, Genetics.

[11]  J. Hein,et al.  Consequences of recombination on traditional phylogenetic analysis. , 2000, Genetics.

[12]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[13]  J. Hein A heuristic method to reconstruct the history of sequences subject to recombination , 1993, Journal of Molecular Evolution.

[14]  K. Crandall,et al.  Intraspecific gene genealogies: trees grafting into networks. , 2001, Trends in ecology & evolution.

[15]  R. Griffiths,et al.  Bounds on the minimum number of recombination events in a sample history. , 2003, Genetics.

[16]  Dan Gusfield,et al.  Efficient algorithms for inferring evolutionary trees , 1991, Networks.