A Resolution of the Static Formulation Question for the Problem of Computing the History Bound

Evolutionary data has been traditionally modeled via phylogenetic trees; however, branching alone cannot model conflicting phylogenetic signals, so networks are used instead. Ancestral recombination graphs (ARGs) are used to model the evolution of incompatible sets of SNP data, allowing each site to mutate only once. The model often aims to minimize the number of recombinations. Similarly, incompatible cluster data can be represented by a reticulation network that minimizes reticulation events. The ARG literature has traditionally been disjoint from the reticulation network literature. By building on results from the reticulation network literature, we resolve an open question of interest to the ARG community. We explicitly prove that the History Bound, a lower bound on the number of recombinations in an ARG for a binary matrix, which was previously only defined procedurally, is equal to the minimum number of reticulation nodes in a network for the corresponding cluster data. To facilitate the proof, we give an algorithm that constructs this network using intermediate values from the procedural History Bound definition. We then develop a top-down algorithm for computing the History Bound, which has the same worst-case runtime as the known dynamic program, and show that it is likely to run faster in typical cases.

[1]  P. Marjoram,et al.  Ancestral Inference from Samples of DNA Sequences with Recombination , 1996, J. Comput. Biol..

[2]  Vineet Bafna,et al.  Inference about Recombination from Haplotype Data: Lower Bounds and Recombination Hotspots , 2006, J. Comput. Biol..

[3]  Dan Gusfield,et al.  ReCombinatorics: The Algorithmics of Ancestral Recombination Graphs and Explicit Phylogenetic Networks , 2014 .

[4]  Steven Kelk,et al.  When two trees go to war. , 2010, Journal of theoretical biology.

[5]  Kaizhong Zhang,et al.  Perfect phylogenetic networks with recombination , 2001, J. Comput. Biol..

[6]  Daniel H. Huson,et al.  Phylogenetic Networks - Concepts, Algorithms and Applications , 2011 .

[7]  Dan Gusfield,et al.  A new recombination lower bound and the minimum perfect phylogenetic forest problem , 2008, J. Comb. Optim..

[8]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[9]  Yun S. Song,et al.  Parsimonious Reconstruction of Sequence Evolution and Haplotype Blocks , 2003, WABI.

[10]  Charles Semple,et al.  Computing the minimum number of hybridization events for a consistent evolutionary history , 2007, Discret. Appl. Math..

[11]  J. Wakeley Coalescent Theory: An Introduction , 2008 .

[12]  R. Griffiths,et al.  Bounds on the minimum number of recombination events in a sample history. , 2003, Genetics.

[13]  Ignazio Carbone,et al.  Phylogenetic relatedness of the M2 double-stranded RNA in Rhizoctonia fungi , 2008, Mycologia.

[14]  Leo van Iersel,et al.  On the Elusiveness of Clusters , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  R. Griffiths,et al.  An ancestral recombination graph , 1997 .