Advancing Divide-and-Conquer Phylogeny Estimation using Robinson-Foulds Supertrees

One of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a “supertree method”. Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. We also present GreedyRFS (a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree). We evaluate Exact-RFS-2 and GreedyRFS, and show that they have better accuracy than the current leading heuristic for RFS. Exact-RFS-2 and GreedyRFS are available in open source form on Github at github.com/yuxilin51/GreedyRFS.

[1]  Erin K. Molloy,et al.  FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models , 2019, Bioinform..

[2]  Erin K. Molloy,et al.  FastMulRFS: Statistically consistent polynomial time species tree estimation under gene duplication , 2019, bioRxiv.

[3]  Tandy J. Warnow,et al.  Long‐Branch Attraction in Species Tree Estimation: Inconsistency of Partitioned Likelihood and Topology‐Based Summary Methods , 2018, Systematic biology.

[4]  Tandy Warnow,et al.  Divide-and-Conquer Tree Estimation: Opportunities and Challenges , 2019, Bioinformatics and Phylogenetics.

[5]  Chao Zhang,et al.  ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees , 2018, BMC Bioinformatics.

[6]  Tandy Warnow,et al.  Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation , 2017 .

[7]  Chao Zhang,et al.  ASTRAL-III: Increased Scalability and Impacts of Contracting Low Support Branches , 2017, RECOMB-CG.

[8]  S. Böcker,et al.  Bad Clade Deletion Supertrees: A Fast and Accurate Supertree Algorithm , 2017, Molecular biology and evolution.

[9]  Tandy J. Warnow,et al.  FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization , 2016, Bioinform..

[10]  Christophe Paul,et al.  Efficient FPT Algorithms for (Strict) Compatibility of Unrooted Phylogenetic Trees , 2016, Bulletin of Mathematical Biology.

[11]  David Posada,et al.  SimPhy: Phylogenomic Simulation of Gene, Locus, and Species Trees , 2015, bioRxiv.

[12]  Huw A. Ogilvie,et al.  Computational Performance and Statistical Accuracy of *BEAST and Comparisons with Other Methods , 2015, Systematic biology.

[13]  D. Posada,et al.  A Bayesian Supertree Model for Genome-Wide Species Tree Reconstruction , 2014, Systematic biology.

[14]  S. Böcker,et al.  Collecting reliable clades using the Greedy Strict Consensus Merger , 2016, PeerJ Prepr..

[15]  Yutaka Saito,et al.  Detection of differentially methylated regions from bisulfite-seq data by hidden Markov models incorporating genome-wide methylation level distributions , 2015, BMC Genomics.

[16]  Tandy Warnow,et al.  ASTRID: Accurate Species TRees from Internode Distances , 2015, bioRxiv.

[17]  Tandy J. Warnow,et al.  ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes , 2015, Bioinform..

[18]  Tandy Warnow,et al.  Concatenation Analyses in the Presence of Incomplete Lineage Sorting , 2015, PLoS currents.

[19]  David Fernández-Baca,et al.  MulRF: a software package for phylogenetic analysis using multi-copy gene trees , 2015, Bioinform..

[20]  David Fernández-Baca,et al.  Fixed-Parameter Algorithms for Finding Agreement Supertrees , 2012, SIAM J. Comput..

[21]  Tandy J. Warnow,et al.  ASTRAL: genome-scale coalescent-based species tree estimation , 2014, Bioinform..

[22]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[23]  Tandy J. Warnow,et al.  DACTAL: divide-and-conquer trees (almost) without alignments , 2012, Bioinform..

[24]  Tandy Warnow,et al.  SuperFine: fast and accurate supertree estimation. , 2012, Systematic biology.

[25]  Tandy J. Warnow,et al.  MRL and SuperFine+MRL: new supertree methods , 2012, Algorithms for Molecular Biology.

[26]  A. Kupczok Split-based computation of majority-rule supertrees , 2011, BMC Evolutionary Biology.

[27]  Satish Rao,et al.  Quartets MaxCut: A Divide and Conquer Quartets Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Tandy J. Warnow,et al.  An experimental study of Quartets MaxCut and other supertree methods , 2010, Algorithms for Molecular Biology.

[29]  Sylvain Guillemot,et al.  Fixed-Parameter Tractability of the Maximum Agreement Supertree Problem , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  David Fernández-Baca,et al.  Robinson-Foulds Supertrees , 2010, Algorithms for Molecular Biology.

[31]  Tandy J. Warnow,et al.  A simulation study comparing supertree and combined analysis methods using SMIDGen , 2009, Algorithms for Molecular Biology.

[32]  Mike Steel,et al.  Maximum likelihood supertrees. , 2007, Systematic biology.

[33]  Mark Wilkinson,et al.  Majority-rule supertrees. , 2007, Systematic biology.

[34]  Mark Wilkinson,et al.  Supertree Methods for Building the Tree of Life: Divide-and-Conquer Approaches to Large Phylogenetic Problems , 2006 .

[35]  François Nicolas,et al.  Improved Parameterized Complexity of the Maximum Agreement Subtree and Maximum Compatible Tree Problems , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[36]  Oliver Eulenstein,et al.  The shape of supertrees to come: tree shape related properties of fourteen supertree methods. , 2005, Systematic biology.

[37]  O. Bininda-Emonds Phylogenetic Supertrees: Combining Information To Reveal The Tree Of Life , 2004 .

[38]  François Nicolas,et al.  Maximum Agreement and Compatible Supertrees (Extended Abstract) , 2004, CPM.

[39]  Charles Semple,et al.  Phylogenetic Supertrees , 2004, Computational Biology.

[40]  Roderic D. M. Page,et al.  Modified Mincut Supertrees , 2002, WABI.

[41]  Charles Semple,et al.  A supertree method for rooted trees , 2000, Discret. Appl. Math..

[42]  Tandy J. Warnow,et al.  A Few Logs Suffice to Build (almost) All Trees: Part II , 1999, Theor. Comput. Sci..

[43]  Tandy J. Warnow,et al.  A few logs suffice to build (almost) all trees (I) , 1999, Random Struct. Algorithms.

[44]  P. Erdös,et al.  A few logs suffice to build (almost) all trees (l): part I , 1997 .

[45]  W. Maddison Gene Trees in Species Trees , 1997 .

[46]  P. Erdös,et al.  Local Quartet Splits of a Binary Tree Infer All Quartet Splits Via One Dyadic Inference Rule , 1996, Comput. Artif. Intell..

[47]  Cynthia A. Phillips,et al.  The Asymmetric Median Tree - A New Model for Building Consensus Trees , 1996, Discret. Appl. Math..

[48]  Mike Steel,et al.  The complexity of the median procedure for binary trees , 1994 .

[49]  M. Ragan Phylogenetic inference based on matrix representation of trees. , 1992, Molecular phylogenetics and evolution.

[50]  B. Baum Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees , 1992 .

[51]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[52]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[53]  Alfred V. Aho,et al.  Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational Expressions , 1981, SIAM J. Comput..

[54]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[55]  F. McMorris On the compatibility of binary qualitative taxonomic characters. , 1977, Bulletin of mathematical biology.