Maximum parsimony distance on phylogenetictrees: a linear kernel and constant factor approximation algorithm

Maximum parsimony distance is a measure used to quantify the dissimilarity of two unrooted phylogenetic trees. It is NP-hard to compute, and very few positive algorithmic results are known due to its complex combinatorial structure. Here we address this shortcoming by showing that the problem is fixed parameter tractable. We do this by establishing a linear kernel i.e., that after applying certain reduction rules the resulting instance has size that is bounded by a linear function of the distance. As powerful corollaries to this result we prove that the problem permits a polynomial-time constant-factor approximation algorithm; that the treewidth of a natural auxiliary graph structure encountered in phylogenetics is bounded by a function of the distance; and that the distance is within a constant factor of the size of a maximum agreement forest of the two trees, a well studied object in phylogenetics.

[1]  Feng Shi,et al.  Approximating Maximum Agreement Forest on Multiple Binary Trees , 2016, Algorithmica.

[2]  Hans L. Bodlaender,et al.  A linear time algorithm for finding tree-decompositions of small treewidth , 1993, STOC.

[3]  Feng Shi,et al.  Parameterized Algorithms for the Maximum Agreement Forest Problem on Multiple Rooted Multifurcating Trees , 2016, J. Comput. Syst. Sci..

[4]  Leo van Iersel,et al.  A Third Strike Against Perfect Phylogeny , 2018, Systematic biology.

[5]  Magnus Bordewich,et al.  On the fixed parameter tractability of agreement-based phylogenetic distances , 2017, Journal of mathematical biology.

[6]  Christophe Paul,et al.  Efficient FPT Algorithms for (Strict) Compatibility of Unrooted Phylogenetic Trees , 2016, AAIM.

[7]  Norbert Zeh,et al.  Fixed-Parameter Algorithms for Maximum Agreement Forests , 2011, SIAM J. Comput..

[8]  Leo van Iersel,et al.  Hybridization Number on Three Rooted Binary Trees is EPT , 2014, SIAM J. Discret. Math..

[9]  L. Nakhleh,et al.  Computational approaches to species phylogeny inference and gene tree reconciliation. , 2013, Trends in ecology & evolution.

[10]  M. Steel,et al.  Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees , 2001 .

[11]  Steven Kelk,et al.  Treewidth distance on phylogenetic trees , 2017, Theor. Comput. Sci..

[12]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[13]  Steven Kelk,et al.  On the Complexity of Computing MP Distance Between Binary Phylogenetic Trees , 2014, ArXiv.

[14]  T. Turner Phylogenetics , 2018, The International Encyclopedia of Biological Anthropology.

[15]  Mike A. Steel,et al.  Phylogeny - discrete and random processes in evolution , 2016, CBMS-NSF regional conference series in applied mathematics.

[16]  Sagi Snir,et al.  Convex recolorings of strings and trees: Definitions, hardness results and algorithms , 2008, J. Comput. Syst. Sci..

[17]  Steven Kelk,et al.  A note on convex characters, Fibonacci numbers and exponential-time algorithms , 2017, Adv. Appl. Math..

[18]  Colin McDiarmid,et al.  Extremal Distances for Subtree Transfer Operations in Binary Trees , 2015, Annals of Combinatorics.

[19]  Hans L. Bodlaender,et al.  A Tourist Guide through Treewidth , 1993, Acta Cybern..

[20]  Steven Kelk,et al.  A Linear Bound on the Number of States in Optimal Convex Characters for Maximum Parsimony Distance , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Laurent Bulteau,et al.  Parameterized Algorithms in Bioinformatics: An Overview , 2019, Algorithms.

[22]  Jianer Chen,et al.  Parameterized and approximation algorithms for maximum agreement forest in multifurcating trees , 2015, Theor. Comput. Sci..

[23]  Alexander Martin,et al.  Binary Steiner trees: Structural results and an exact solution approach , 2016, Discret. Optim..

[24]  Steven Kelk,et al.  Reduction rules for the maximum parsimony distance on phylogenetic trees , 2015, Theor. Comput. Sci..

[25]  David Bryant,et al.  Compatibility of unrooted phylogenetic trees is FPT , 2006, Theor. Comput. Sci..

[26]  Tandy Warnow,et al.  Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation , 2017 .

[27]  N. Zeh,et al.  Supertrees Based on the Subtree Prune-and-Regraft Distance , 2014, Systematic biology.

[28]  Leo van Iersel,et al.  Phylogenetic incongruence through the lens of Monadic Second Order logic , 2015, J. Graph Algorithms Appl..

[29]  Katherine St. John,et al.  On the Complexity of uSPR Distance , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  Michal Pilipczuk,et al.  Parameterized Algorithms , 2015, Springer International Publishing.

[31]  Noga Alon,et al.  Approximate Maximum Parsimony and Ancestral Maximum Likelihood , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Daniel H. Huson,et al.  Phylogenetic Networks - Concepts, Algorithms and Applications , 2011 .

[33]  Vincent Moulton,et al.  A parsimony-based metric for phylogenetic trees , 2015, Adv. Appl. Math..

[34]  Steven Kelk,et al.  On the Maximum Parsimony Distance Between Phylogenetic Trees , 2014, Annals of Combinatorics.

[35]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[36]  Steven Kelk,et al.  New Reduction Rules for the Tree Bisection and Reconnection Distance , 2019, Annals of Combinatorics.

[37]  Steven Kelk,et al.  Treewidth of display graphs: bounds, brambles and applications , 2018, J. Graph Algorithms Appl..

[38]  Steven Kelk,et al.  A tight kernel for computing the tree bisection and reconnection distance between two phylogenetic trees , 2018, SIAM J. Discret. Math..