Chain Reduction Preserves the Unrooted Subtree Prune-and-Regraft Distance

The subtree prune-and-regraft (SPR) distance metric is a fundamental way of comparing evolutionary trees. It has wide-ranging applications, such as to study lateral genetic transfer, viral recombination, and Markov chain Monte Carlo phylogenetic inference. Although the rooted version of SPR distance can be com puted relatively efficiently between rooted trees using fixed-parameter-tractable algorithms, in the unrooted case previous algorithms are unable to compute distances larger than 7. One important tool for efficient computation in the rooted case is called chain reduction, which replaces an arbitrary chain of subtrees identical in both trees with a chain of three leaves. Whether chain reduction preserves SPR distance in the unrooted case has remained an open question since it was conjectured in 2001 by Allen and Steel, and was presented as a challenge question at the 2007 Isaac Newton Institute for Mathematical Sciences program on phylogenetics. In this paper we prove that chain reduction preserves the unrooted SPR distance. We do so by introducing a structure called a socket agreement forest that restricts edge modification to predetermined socket vertices, permitting detailed analysis and modification of SPR move sequences. This new chain reduction theorem reduces the unrooted distance problem to a linear size problem kernel, substantially improving on the previous best quadratic size kernel.

[1]  M. Steel,et al.  Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees , 2001 .

[2]  Vincent Moulton,et al.  A parsimony-based metric for phylogenetic trees , 2015, Adv. Appl. Math..

[3]  Frederick A. Matsen,et al.  Tanglegrams: A Reduction Tool for Mathematical Phylogenetics , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Alexei J Drummond,et al.  Guided tree topology proposals for Bayesian phylogenetic inference. , 2012, Systematic biology.

[5]  Glenn Hickey,et al.  SPR Distance Computation for Unrooted Trees , 2008, Evolutionary bioinformatics online.

[6]  V. Moulton,et al.  Bounding the Number of Hybridisation Events for a Consistent Evolutionary History , 2005, Journal of mathematical biology.

[7]  Jianer Chen,et al.  Parameterized and approximation algorithms for maximum agreement forest in multifurcating trees , 2015, Theor. Comput. Sci..

[8]  Yufeng Wu,et al.  A practical method for exact computation of subtree prune and regraft distance , 2009, Bioinform..

[9]  Eugene V. Koonin,et al.  The Turbulent Network Dynamics of Microbial Evolution and the Statistical Tree of Life , 2015, Journal of Molecular Evolution.

[10]  J. Scott Provan,et al.  A Fast Algorithm for Computing Geodesic Distances in Tree Space , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Davide Pisani,et al.  Supertrees disentangle the chimerical origin of eukaryotic genomes. , 2007, Molecular biology and evolution.

[12]  J. Huelsenbeck,et al.  Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. , 2008, Systematic biology.

[13]  Colin McDiarmid,et al.  Extremal Distances for Subtree Transfer Operations in Binary Trees , 2015, Annals of Combinatorics.

[14]  Timothy J. Harlow,et al.  Highways of gene sharing in prokaryotes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[16]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[17]  W. Maddison Gene Trees in Species Trees , 1997 .

[18]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[19]  Yang Ding,et al.  On agreement forests , 2011, J. Comb. Theory, Ser. A.

[20]  Katherine St. John,et al.  Efficiently calculating evolutionary tree measures using SAT , 2009 .

[21]  Zhi-Zhong Chen,et al.  Faster exact computation of rSPR distance , 2015, J. Comb. Optim..

[22]  Maria Luisa Bonet,et al.  On the Complexity of uSPR Distance , 2010, IEEE/ACM Transactions on Computational Biology & Bioinformatics.

[23]  Tao Jiang,et al.  On the Complexity of Comparing Evolutionary Trees , 1996, Discret. Appl. Math..

[24]  Steven Kelk,et al.  On the Complexity of Computing MP Distance Between Binary Phylogenetic Trees , 2014, ArXiv.

[25]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[26]  David Fernández-Baca,et al.  Robinson-Foulds Supertrees , 2010, Algorithms for Molecular Biology.

[27]  Sara C. Billey,et al.  On the enumeration of tanglegrams and tangled chains , 2017, J. Comb. Theory, Ser. A.

[28]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[29]  Mike Steel,et al.  Maximum likelihood supertrees. , 2007, Systematic biology.

[30]  Charles Semple,et al.  On the Computational Complexity of the Rooted Subtree Prune and Regraft Distance , 2005 .

[31]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[32]  Nicholas Hamilton,et al.  Phylogenetic identification of lateral genetic transfer events , 2006, BMC Evolutionary Biology.

[33]  Trevor Bedford,et al.  Reassortment between Influenza B Lineages and the Emergence of a Coadapted PB1–PB2–HA Gene Complex , 2014, Molecular biology and evolution.

[34]  N. Zeh,et al.  Supertrees Based on the Subtree Prune-and-Regraft Distance , 2014, Systematic biology.

[35]  Thomas B Kepler,et al.  B-cell–lineage immunogen design in vaccine development with HIV-1 as a case study , 2012, Nature Biotechnology.

[36]  Feng Shi,et al.  Improved Approximation Algorithm for Maximum Agreement Forest of Two Trees , 2014, FAW.

[37]  N. Galtier,et al.  Dealing with incongruence in phylogenomic analyses , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[38]  Frederick Albert Matsen IV Phylogenetics and the Human Microbiome , 2014, Systematic biology.

[39]  Norbert Zeh,et al.  Fixed-Parameter and Approximation Algorithms for Maximum Agreement Forests of Multifurcating Trees , 2013, Algorithmica.

[40]  Norbert Zeh,et al.  Fixed-Parameter Algorithms for Maximum Agreement Forests , 2011, SIAM J. Comput..

[41]  Norbert Zeh,et al.  Fast FPT Algorithms for Computing Rooted Agreement Forests: Theory and Experiments , 2010, SEA.

[42]  E. Castro-Nallar,et al.  The evolution of HIV: Inferences using phylogenetics , 2011, Molecular Phylogenetics and Evolution.

[43]  IV FrederickA.Matsen,et al.  Ricci-Ollivier curvature of the rooted phylogenetic subtree-prune-regraft graph , 2015, Theor. Comput. Sci..

[44]  Chris Whidden,et al.  Quantifying MCMC Exploration of Phylogenetic Tree Space , 2014, Systematic biology.

[45]  W. H. Day Optimal algorithms for comparing trees with labeled leaves , 1985 .

[46]  David Bryant,et al.  Parsimony via consensus. , 2007, Systematic biology.

[47]  Feng Shi,et al.  Approximation Algorithms for Maximum Agreement Forest on Multiple Trees , 2014, COCOON.

[48]  Simone Linz,et al.  A Cluster Reduction for Computing the Subtree Distance Between Phylogenies , 2011 .

[49]  Charles Semple,et al.  Computing the minimum number of hybridization events for a consistent evolutionary history , 2007, Discret. Appl. Math..

[50]  Matthew R. Helmus,et al.  Phylogenetic Measures of Biodiversity , 2007, The American Naturalist.

[51]  Tandy J. Warnow,et al.  Reconstructing reticulate evolution in species: theory and practice , 2004, RECOMB.

[52]  Christian N. S. Pedersen,et al.  Computing the Quartet Distance between Evolutionary Trees in Time O(n log n) , 2001, Algorithmica.