Supertree Construction: Opportunities and Challenges

Supertree construction is the process by which a set of phylogenetic trees, each on a subset of the overall set X of species, is combined into a tree on the full set S. The traditional use of supertree methods is the assembly of a large species tree from previously computed smaller species trees; however, supertree methods are also used to address large-scale tree estimation using divide-and-conquer (i.e., a dataset is divided into overlapping subsets, trees are constructed on the subsets, and then combined using the supertree method). Because most supertree methods are heuristics for NP-hard optimization problems, the use of supertree estimation on large datasets is challenging, both in terms of scalability and accuracy. In this paper, we describe the current state of the art in supertree construction and the use of supertree methods in divide-and-conquer strategies. Finally, we identify directions where future research could lead to improved supertree methods.

[1]  B. Baum Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees , 1992 .

[2]  Takeya Shigezumi,et al.  Robustness of Greedy Type Minimum Evolution Algorithms , 2006, International Conference on Computational Science.

[3]  Olivier Gascuel,et al.  SDM: a fast distance-based approach for (super) tree building in phylogenomics. , 2006, Systematic biology.

[4]  Mike Steel,et al.  Phylogenomics with incomplete taxon coverage: the limits to inference , 2010, BMC Evolutionary Biology.

[5]  Mark A. Ragan,et al.  The MRP Method , 2004 .

[6]  Sagi Snir,et al.  Triplet MaxCut: a new toolkit for rooted supertree , 2016 .

[7]  Katharina T. Huber,et al.  Imputing Supertrees and Supernetworks from Quartets , 2006, WABI.

[8]  Lior Pachter,et al.  Why Neighbor-Joining Works , 2006, Algorithmica.

[9]  Oliver Eulenstein,et al.  Quartet Supertrees , 2004 .

[10]  Tao Jiang,et al.  A Polynomial Time Approximation Scheme for Inferring Evolutionary Trees from Quartet Topologies and Its Application , 2001, SIAM J. Comput..

[11]  João Luís Sobral,et al.  Parallel SuperFine - A tool for fast and accurate supertree estimation: Features and limitations , 2017, Future Gener. Comput. Syst..

[12]  Noah A. Rosenberg,et al.  iGLASS: An Improvement to the GLASS Method for Estimating Species Trees from Gene Trees , 2012, J. Comput. Biol..

[13]  Vincent Ranwez,et al.  SuperTriplets: a triplet-based supertree approach to phylogenomics , 2010, Bioinform..

[14]  Davide Pisani,et al.  Supertrees disentangle the chimerical origin of eukaryotic genomes. , 2007, Molecular biology and evolution.

[15]  S. Snir,et al.  The Performance of Two Supertree Schemes Compared Using Synthetic and Real Data Quartet Input , 2018, Journal of Molecular Evolution.

[16]  Satish Rao,et al.  A tight bound on approximating arbitrary metrics by tree metrics , 2003, STOC '03.

[17]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[18]  O. Bininda-Emonds,et al.  Supertree construction in the genomic age. , 2005, Methods in enzymology.

[19]  David Fernández-Baca,et al.  Improved Heuristics for Minimum-Flip Supertree Construction , 2006, Evolutionary bioinformatics online.

[20]  O. Gascuel,et al.  Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting. , 2003, Molecular biology and evolution.

[21]  Chao Zhang,et al.  ASTRAL-III: Increased Scalability and Impacts of Contracting Low Support Branches , 2017, RECOMB-CG.

[22]  Scott V Edwards,et al.  A maximum pseudo-likelihood approach for estimating species trees under the coalescent model , 2010, BMC Evolutionary Biology.

[23]  Tandy J. Warnow,et al.  Inferring Optimal Species Trees Under Gene Duplication and Loss , 2013, Pacific Symposium on Biocomputing.

[24]  Dan Pelleg,et al.  Constructing Phylogenies from Quartets: Elucidation of Eutherian Superordinal Relationships , 1998, J. Comput. Biol..

[25]  Mikkel Thorup,et al.  On the approximability of numerical taxonomy (fitting distances by tree metrics) , 1996, SODA '96.

[26]  Tandy J. Warnow,et al.  Absolute convergence: true trees from short sequences , 2001, SODA '01.

[27]  Tandy J. Warnow,et al.  Ultra-large alignments using phylogeny-aware profiles , 2015, Genome Biology.

[28]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[29]  Oliver Eulenstein,et al.  DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony , 2008, Bioinform..

[30]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[31]  J. L. Gittleman,et al.  The (Super)Tree of Life: Procedures, Problems, and Prospects , 2002 .

[32]  Md. Shamsuzzoha Bayzid,et al.  Whole-genome analyses resolve early branches in the tree of life of modern birds , 2014, Science.

[33]  R. Gadagkar Nothing in Biology Makes Sense Except in the Light of Evolution , 2005 .

[34]  Tandy Warnow,et al.  Disk covering methods improve phylogenomic analyses , 2014, BMC Genomics.

[35]  P. Buneman The Recovery of Trees from Measures of Dissimilarity , 1971 .

[36]  Tandy J. Warnow,et al.  ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes , 2015, Bioinform..

[37]  Mike A. Steel,et al.  Algorithmic Aspects of Tree Amalgamation , 2000, J. Algorithms.

[38]  Fred R. McMorris,et al.  Axioms for consensus functions on undirected phylogenetic trees , 1985 .

[39]  Olivier Gascuel,et al.  Fast NJ-like algorithms to deal with incomplete distance matrices , 2008, BMC Bioinformatics.

[40]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[41]  Tandy J. Warnow,et al.  MRL and SuperFine+MRL: new supertree methods , 2012, Algorithms for Molecular Biology.

[42]  Mike A. Steel,et al.  Constructing Optimal Trees from Quartets , 2001, J. Algorithms.

[43]  Adrian M. Altenhoff,et al.  Standardized benchmarking in the quest for orthologs , 2016, Nature Methods.

[44]  S. Böcker,et al.  Bad Clade Deletion Supertrees: A Fast and Accurate Supertree Algorithm , 2017, Molecular biology and evolution.

[45]  Liang Liu,et al.  Estimating species trees from unrooted gene trees. , 2011, Systematic biology.

[46]  Sylvain Guillemot,et al.  PhySIC: a veto supertree method with desirable properties. , 2007, Systematic biology.

[47]  Olivier Gascuel,et al.  Combinatorics of distance-based tree inference , 2012, Proceedings of the National Academy of Sciences.

[48]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[49]  Travis J. Wheeler,et al.  Large-Scale Neighbor-Joining with NINJA , 2009, WABI.

[50]  Michael J. Sanderson,et al.  EVOLUTION OF GENOME SIZE IN PINES (PINUS) AND ITS LIFE‐HISTORY CORRELATES: SUPERTREE ANALYSES , 2004, Evolution; international journal of organic evolution.

[51]  H. Philippe,et al.  Heterotachy, an important process of protein evolution. , 2002, Molecular biology and evolution.

[52]  Tandy Warnow,et al.  SVDquest: Improving SVDquartets species tree estimation using exact optimization within a constrained search space. , 2018, Molecular phylogenetics and evolution.

[53]  Mark D. Wilkinson,et al.  A view of supertree methods , 2001, Bioconsensus.

[54]  M. Steel,et al.  Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. , 2015, Theoretical population biology.

[55]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[56]  Rolf Niedermeier,et al.  A fixed-parameter algorithm for minimum quartet inconsistency , 2003, J. Comput. Syst. Sci..

[57]  E. Vrba,et al.  A complete estimate of the phylogenetic relationships in Ruminantia: a dated species‐level supertree of the extant ruminants , 2005, Biological reviews of the Cambridge Philosophical Society.

[58]  Mike Steel,et al.  Terraces in Phylogenetic Tree Space , 2011, Science.

[59]  Mark Wilkinson,et al.  Majority-rule supertrees. , 2007, Systematic biology.

[60]  Daniel Doerr,et al.  Orthology Detection Combining Clustering and Synteny for Very Large Datasets , 2014, PloS one.

[61]  Tandy J. Warnow,et al.  An experimental study of Quartets MaxCut and other supertree methods , 2010, Algorithms for Molecular Biology.

[62]  Kimmen Sjölander,et al.  Ortholog identification in the presence of domain architecture rearrangement , 2011, Briefings Bioinform..

[63]  J. L. Gittleman,et al.  Building large trees by combining phylogenetic information: a complete phylogeny of the extant Carnivora (Mammalia) , 1999, Biological reviews of the Cambridge Philosophical Society.

[64]  Noga Alon,et al.  On the Compatibility of Quartet Trees , 2014, SIAM J. Discret. Math..

[65]  J. Cotton,et al.  Supertrees join the mainstream of phylogenetics. , 2009, Trends in ecology & evolution.

[66]  João Luís Sobral,et al.  Parallelizing SuperFine , 2012, SAC '12.

[67]  M. Ragan Phylogenetic inference based on matrix representation of trees. , 1992, Molecular phylogenetics and evolution.

[68]  Tandy Warnow,et al.  ASTRID: Accurate Species TRees from Internode Distances , 2015, bioRxiv.

[69]  Magnus Bordewich,et al.  Accuracy Guarantees for Phylogeny Reconstruction Algorithms Based on Balanced Minimum Evolution , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[70]  David Fernández-Baca,et al.  Minimum-flip supertrees: complexity and algorithms , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[71]  T. Warnow,et al.  SIESTA: enhancing searches for optimal supertrees and species trees , 2018, BMC Genomics.

[72]  David Fernández-Baca,et al.  MulRF: a software package for phylogenetic analysis using multi-copy gene trees , 2015, Bioinform..

[73]  S. Edwards IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING? , 2009, Evolution; international journal of organic evolution.

[74]  S. J. Willson,et al.  Constructing rooted supertrees using distances , 2004, Bulletin of mathematical biology.

[75]  Tandy J. Warnow,et al.  A few logs suffice to build (almost) all trees (I) , 1999, Random Struct. Algorithms.

[76]  Mike Steel,et al.  Maximum likelihood supertrees. , 2007, Systematic biology.

[77]  Manuel Lafond,et al.  On the Weighted Quartet Consensus problem , 2017, CPM.

[78]  Satish Rao,et al.  Quartets MaxCut: A Divide and Conquer Quartets Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[79]  O. Gascuel,et al.  Neighbor-joining revealed. , 2006, Molecular biology and evolution.

[80]  Saravanaraj N. Ayyampalayam,et al.  Phylotranscriptomic analysis of the origin and early diversification of land plants , 2014, Proceedings of the National Academy of Sciences.

[81]  Tandy J. Warnow,et al.  FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization , 2016, Bioinform..

[82]  Tandy J. Warnow,et al.  DACTAL: divide-and-conquer trees (almost) without alignments , 2012, Bioinform..

[83]  Olivier Gascuel,et al.  Inferring evolutionary trees with strong combinatorial evidence , 1997, Theor. Comput. Sci..

[84]  M. Steel The complexity of reconstructing trees from qualitative characters and subtrees , 1992 .

[85]  K. Huber,et al.  Reconstructing (super)trees from data sets with missing distances: not all is lost. , 2015, Molecular biology and evolution.

[86]  Christopher J. Creevey,et al.  Implementing and testing Bayesian and maximum-likelihood supertree methods in phylogenetics , 2015, Royal Society Open Science.

[87]  Tandy J. Warnow,et al.  OCTAL: Optimal Completion of gene trees in polynomial time , 2018, Algorithms for Molecular Biology.

[88]  David Fernández-Baca,et al.  Fast Local Search for Unrooted Robinson-Foulds Supertrees , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[89]  Sagi Snir,et al.  Weighted quartets phylogenetics. , 2015, Systematic biology.

[90]  Charles Semple,et al.  A supertree method for rooted trees , 2000, Discret. Appl. Math..

[91]  A. Purvis,et al.  A phylogenetic supertree of the bats (Mammalia: Chiroptera) , 2002, Biological reviews of the Cambridge Philosophical Society.

[92]  M. Chase,et al.  Complete generic-level phylogenetic analyses of palms (Arecaceae) with comparisons of supertree and supermatrix approaches. , 2009, Systematic biology.

[93]  Rezwana Reaz,et al.  Accurate Phylogenetic Tree Reconstruction from Quartets: A Heuristic Approach , 2014, PloS one.

[94]  Leonardo de Oliveira Martins,et al.  Species Tree Estimation from Genome-wide Data with Guenomu , 2015, bioRxiv.

[95]  Thylogale,et al.  THE AVERAGE CONSENSUS PROCEDURE: COMBINATION OF WEIGHTED TREES CONTAINING IDENTICAL OR OVERLAPPING SETS OF TAXA , 2009 .

[96]  A. Kupczok Split-based computation of majority-rule supertrees , 2011, BMC Evolutionary Biology.

[97]  B. Boussau,et al.  Efficient Exploration of the Space of Reconciled Gene Trees , 2013, Systematic biology.

[98]  J. Huelsenbeck,et al.  Application and accuracy of molecular phylogenies. , 1994, Science.

[99]  Tandy J. Warnow,et al.  Algorithms for MDC-Based Multi-Locus Phylogeny Inference: Beyond Rooted Binary Gene Trees on Single Alleles , 2011, J. Comput. Biol..

[100]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[101]  Tao Jiang,et al.  A practical algorithm for recovering the best supported edges of an evolutionary tree (extended abstract) , 2000, SODA '00.

[102]  David Fernández-Baca,et al.  Robinson-Foulds Supertrees , 2010, Algorithms for Molecular Biology.

[103]  Satish Rao,et al.  Using Max Cut to Enhance Rooted Trees Consistency , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[104]  O. Gascuel,et al.  Quartet-based phylogenetic inference: improvements and limits. , 2001, Molecular biology and evolution.

[105]  J. McInerney,et al.  Trees from trees: construction of phylogenetic supertrees using clann. , 2009, Methods in molecular biology.

[106]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[107]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[108]  Olivier Gascuel,et al.  FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program , 2015, Molecular biology and evolution.

[109]  David Fernández-Baca,et al.  iGTP: A software package for large-scale gene tree parsimony analysis , 2010, BMC Bioinformatics.

[110]  Pablo A. Goloboff,et al.  TNT, a free program for phylogenetic analysis , 2008 .

[111]  Colin N. Dewey,et al.  BUCKy: Gene tree/species tree reconciliation with Bayesian concordance analysis , 2010, Bioinform..

[112]  W. A. Beyer,et al.  Some Biological Sequence Metrics , 1976 .

[113]  Luay Nakhleh,et al.  Species Tree Inference by Minimizing Deep Coalescences , 2009, PLoS Comput. Biol..

[114]  Mark D. Wilkinson,et al.  L.U.St: a tool for approximated maximum likelihood supertree reconstruction , 2014, BMC Bioinformatics.

[115]  Elchanan Mossel,et al.  Incomplete Lineage Sorting: Consistent Phylogeny Estimation from Multiple Loci , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[116]  M. Steel,et al.  Impacts of Terraces on Phylogenetic Inference. , 2014, Systematic biology.

[117]  Pamela S Soltis,et al.  Darwin's abominable mystery: Insights from a supertree of the angiosperms , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[118]  Nir Ailon,et al.  Fitting tree metrics: Hierarchical clustering and phylogeny , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[119]  Michael T. Hallett,et al.  New algorithms for the duplication-loss model , 2000, RECOMB '00.

[120]  Tandy Warnow,et al.  SuperFine: fast and accurate supertree estimation. , 2012, Systematic biology.

[121]  Sebastian Böcker,et al.  Polynomial Supertree Methods Revisited , 2010, PRIB.

[122]  Olaf R. P. Bininda-Emonds MRP supertree construction in the consensus setting , 2001, Bioconsensus.

[123]  Michael J Benton,et al.  A genus-level supertree of the Dinosauria , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[124]  Michael G. Nute,et al.  Scaling statistical multiple sequence alignment to large datasets , 2016, BMC Genomics.

[125]  D. Posada,et al.  A Bayesian Supertree Model for Genome-Wide Species Tree Reconstruction , 2014, Systematic biology.

[126]  Bernard M. E. Moret,et al.  Performance of Supertree Methods on Various Data Set Decompositions , 2004 .

[127]  David Fernández-Baca,et al.  Flipping: A supertree construction method , 2001, Bioconsensus.

[128]  T. Davies,et al.  Using Supertrees to Investigate Species Richness in Grasses and Flowering Plants , 2004 .

[129]  Olga Chernomor,et al.  Terrace Aware Data Structure for Phylogenomic Inference from Supermatrices , 2016, Systematic biology.

[130]  W. A. Beyer,et al.  Additive evolutionary trees. , 1977, Journal of theoretical biology.

[131]  Katharina T. Huber,et al.  ape 3.0: New tools for distance-based phylogenetics and evolutionary analysis in R , 2012, Bioinform..

[132]  Tandy Warnow,et al.  Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation , 2017 .

[133]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[134]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[135]  Bin Ma,et al.  A new quartet approach for reconstructing phylogenetic trees: quartet joining method , 2007, J. Comb. Optim..

[136]  Mike A. Steel,et al.  Computing the Distribution of a Tree Metric , 2009, IEEE ACM Trans. Comput. Biol. Bioinform..

[137]  Allen G. Rodrigo,et al.  A comment on Baum's method for combining phylogenetic trees , 1993 .

[138]  O. Bininda-Emonds Phylogenetic Supertrees: Combining Information To Reveal The Tree Of Life , 2004 .

[139]  Tandy J. Warnow,et al.  The Impact of Multiple Protein Sequence Alignment on Phylogenetic Estimation , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[140]  K. Jønsson,et al.  A phylogenetic supertree of oscine passerine birds (Aves: Passeri) , 2006 .

[141]  Serita M. Nelesen,et al.  Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees , 2009, Science.