Calculating the Unrooted Subtree Prune-and-Regraft Distance

The subtree prune-and-regraft (SPR) distance metric is a fundamental way of comparing evolutionary trees. It has wide-ranging applications, such as to study lateral genetic transfer, viral recombination, and Markov chain Monte Carlo phylogenetic inference. Although the rooted version of SPR distance can be computed relatively efficiently between rooted trees using fixed-parameter-tractable maximum agreement forest (MAF) algorithms, no MAF formulation is known for the unrooted case. Correspondingly, previous algorithms are unable to compute unrooted SPR distances larger than 7. In this paper, we substantially advance understanding of and computational algorithms for the unrooted SPR distance. First, we identify four properties of optimal SPR paths, each of which suggests that no MAF formulation exists in the unrooted case. Then, we introduce the replug distance, a new lower bound on the unrooted SPR distance that is amenable to MAF methods, and give an efficient fixed-parameter algorithm for calculating it. Finally, we develop a “progressive A*” search algorithm using multiple heuristics, including the TBR and replug distances, to exactly compute the unrooted SPR distance. Our algorithm is nearly two orders of magnitude faster than previous methods on small trees, and allows computation of unrooted SPR distances as large as 14 on trees with 50 leaves.

[1]  Yang Ding,et al.  On agreement forests , 2011, J. Comb. Theory, Ser. A.

[2]  Tandy J. Warnow,et al.  Reconstructing reticulate evolution in species: theory and practice , 2004, RECOMB.

[3]  Timothy J. Harlow,et al.  Highways of gene sharing in prokaryotes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[5]  Zhi-Zhong Chen,et al.  Faster exact computation of rSPR distance , 2015, J. Comb. Optim..

[6]  V. Moulton,et al.  Bounding the Number of Hybridisation Events for a Consistent Evolutionary History , 2005, Journal of mathematical biology.

[7]  Jiayin Wang,et al.  Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees , 2010, ISBRA.

[8]  Steven Kelk,et al.  On the Complexity of Computing MP Distance Between Binary Phylogenetic Trees , 2014, ArXiv.

[9]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[10]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[11]  Thomas B Kepler,et al.  B-cell–lineage immunogen design in vaccine development with HIV-1 as a case study , 2012, Nature Biotechnology.

[12]  Tao Jiang,et al.  On the Complexity of Comparing Evolutionary Trees , 1996, Discret. Appl. Math..

[13]  Eugene V. Koonin,et al.  The Turbulent Network Dynamics of Microbial Evolution and the Statistical Tree of Life , 2015, Journal of Molecular Evolution.

[14]  J. Scott Provan,et al.  A Fast Algorithm for Computing Geodesic Distances in Tree Space , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Colin McDiarmid,et al.  Extremal Distances for Subtree Transfer Operations in Binary Trees , 2015, Annals of Combinatorics.

[16]  Christian N. S. Pedersen,et al.  Computing the Quartet Distance between Evolutionary Trees in Time O(n log n) , 2001, Algorithmica.

[17]  N. Zeh,et al.  Supertrees Based on the Subtree Prune-and-Regraft Distance , 2014, Systematic biology.

[18]  Silvio Micali,et al.  An O(v|v| c |E|) algoithm for finding maximum matching in general graphs , 1980, 21st Annual Symposium on Foundations of Computer Science (sfcs 1980).

[19]  Josh Voorkamp Maximal Acyclic Agreement Forests , 2014, J. Comput. Biol..

[20]  Stefan Porschen,et al.  Algorithms for Variable-Weighted 2-SAT and Dual Problems , 2007, SAT.

[21]  David Fernández-Baca,et al.  Robinson-Foulds Supertrees , 2010, Algorithms for Molecular Biology.

[22]  J. Huelsenbeck,et al.  Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. , 2008, Systematic biology.

[23]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[24]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[25]  Frederick A. Matsen,et al.  Tanglegrams: A Reduction Tool for Mathematical Phylogenetics , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  N. Galtier,et al.  Dealing with incongruence in phylogenomic analyses , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[27]  Sara C. Billey,et al.  On the enumeration of tanglegrams and tangled chains , 2017, J. Comb. Theory, Ser. A.

[28]  Frederick Albert Matsen IV Phylogenetics and the Human Microbiome , 2014, Systematic biology.

[29]  Norbert Zeh,et al.  Fixed-Parameter and Approximation Algorithms for Maximum Agreement Forests of Multifurcating Trees , 2013, Algorithmica.

[30]  Feng Shi,et al.  Improved Approximation Algorithm for Maximum Agreement Forest of Two Trees , 2014, FAW.

[31]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[32]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[33]  W. H. Day Optimal algorithms for comparing trees with labeled leaves , 1985 .

[34]  David Bryant,et al.  Parsimony via consensus. , 2007, Systematic biology.

[35]  Nicholas Hamilton,et al.  Phylogenetic identification of lateral genetic transfer events , 2006, BMC Evolutionary Biology.

[36]  Trevor Bedford,et al.  Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1–PB2–HA Gene Complex , 2014, Molecular biology and evolution.

[37]  Damir Čemerin,et al.  IV , 2011 .

[38]  Alexei J Drummond,et al.  Guided tree topology proposals for Bayesian phylogenetic inference. , 2012, Systematic biology.

[39]  Jianer Chen,et al.  Parameterized and approximation algorithms for maximum agreement forest in multifurcating trees , 2015, Theor. Comput. Sci..

[40]  Yufeng Wu,et al.  A practical method for exact computation of subtree prune and regraft distance , 2009, Bioinform..

[41]  Glenn Hickey,et al.  SPR Distance Computation for Unrooted Trees , 2008, Evolutionary bioinformatics online.

[42]  Norbert Zeh,et al.  A Unifying View on Approximation and FPT of Agreement Forests , 2009, WABI.

[43]  Davide Pisani,et al.  Supertrees disentangle the chimerical origin of eukaryotic genomes. , 2007, Molecular biology and evolution.

[44]  M. Steel,et al.  Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees , 2001 .

[45]  Vincent Moulton,et al.  A parsimony-based metric for phylogenetic trees , 2015, Adv. Appl. Math..

[46]  IV FrederickA.Matsen,et al.  Chain Reduction Preserves the Unrooted Subtree Prune-and-Regraft Distance , 2016, ArXiv.

[47]  L. Segal John , 2013, The Messianic Secret.

[48]  Feng Shi,et al.  Parameterized Algorithms for Maximum Agreement Forest on Multiple Trees , 2013, COCOON.

[49]  Feng Shi,et al.  Approximation Algorithms for Maximum Agreement Forest on Multiple Trees , 2014, COCOON.

[50]  Charles Semple,et al.  Hybrids in real time. , 2006, Systematic biology.

[51]  Simone Linz,et al.  A Cluster Reduction for Computing the Subtree Distance Between Phylogenies , 2011 .

[52]  Charles Semple,et al.  Computing the minimum number of hybridization events for a consistent evolutionary history , 2007, Discret. Appl. Math..

[53]  Katherine St. John,et al.  On the Complexity of uSPR Distance , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[54]  Matthew R. Helmus,et al.  Phylogenetic Measures of Biodiversity , 2007, The American Naturalist.

[55]  Mike Steel,et al.  Maximum likelihood supertrees. , 2007, Systematic biology.

[56]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[57]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[58]  Shimon Even,et al.  An O (N2.5) algorithm for maximum matching in general graphs , 1975, 16th Annual Symposium on Foundations of Computer Science (sfcs 1975).

[59]  Norbert Zeh,et al.  Fixed-Parameter Algorithms for Maximum Agreement Forests , 2011, SIAM J. Comput..

[60]  Norbert Zeh,et al.  Fast FPT Algorithms for Computing Rooted Agreement Forests: Theory and Experiments , 2010, SEA.

[61]  E. Castro-Nallar,et al.  The evolution of HIV: Inferences using phylogenetics , 2011, Molecular Phylogenetics and Evolution.

[62]  Jianer Chen,et al.  Parameterized and Approximation Algorithms for the MAF Problem in Multifurcating Trees , 2013, WG.

[63]  IV FrederickA.Matsen,et al.  Ricci-Ollivier curvature of the rooted phylogenetic subtree-prune-regraft graph , 2015, Theor. Comput. Sci..

[64]  Maria Luisa Bonet,et al.  Efficiently Calculating Evolutionary Tree Measures Using SAT , 2009, SAT.

[65]  Chris Whidden,et al.  Quantifying MCMC Exploration of Phylogenetic Tree Space , 2014, Systematic biology.