Efficient Quartet Representations of Trees and Applications to Supertree and Summary Methods.

Quartet trees displayed by larger phylogenetic trees have long been used as inputs for species tree and supertree reconstruction. Computational constraints prevent the use of all displayed quartets in many practical problems with large numbers of taxa. We introduce the notion of an Efficient Quartet System (EQS) to represent a phylogenetic tree with a subset of the quartets displayed by the tree. We show mathematically that the set of quartets obtained from a tree via an EQS contains all of the combinatorial information of the tree itself. Using performance tests on simulated datasets, we also demonstrate that using an EQS to reduce the number of quartets in both summary method pipelines for species tree inference as well as methods for supertree inference results in only small reductions in accuracy.

[1]  Charles M. Fiduccia,et al.  A linear-time heuristic for improving network partitions , 1988, 25 years of DAC.

[2]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[3]  David Posada,et al.  SimPhy: Phylogenomic Simulation of Gene, Locus, and Species Trees , 2015, bioRxiv.

[4]  David Posada,et al.  SimPhy: Phylogenomic Simulation of Gene, Locus and Species Trees , 2015, bioRxiv.

[5]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[6]  Mike Steel,et al.  Patching upX-trees , 1999 .

[7]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[8]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[9]  M. Nei,et al.  Relationships between gene trees and species trees. , 1988, Molecular biology and evolution.

[10]  Tandy J. Warnow,et al.  ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes , 2015, Bioinform..

[11]  V. Rich Personal communication , 1989, Nature.

[12]  Dennis W. Stevenson,et al.  Algal ancestor of land plants was preadapted for symbiosis , 2015, Proceedings of the National Academy of Sciences.

[13]  Tandy J. Warnow,et al.  A simulation study comparing supertree and combined analysis methods using SMIDGen , 2009, Algorithms for Molecular Biology.

[14]  M. Steel The complexity of reconstructing trees from qualitative characters and subtrees , 1992 .

[15]  M. Ragan Phylogenetic inference based on matrix representation of trees. , 1992, Molecular phylogenetics and evolution.

[16]  Laura Salter Kubatko,et al.  Quartet Inference from SNP Data Under the Coalescent Model , 2014, Bioinform..

[17]  Ying Xu,et al.  Quartet decomposition server: a platform for analyzing phylogenetic trees , 2012, BMC Bioinformatics.

[18]  Joseph Rusinko,et al.  Combinatorics of Linked Systems of Quartet Trees , 2014 .

[19]  Satish Rao,et al.  Short Quartet Puzzling: A New Quartet-Based Phylogeny Reconstruction Algorithm , 2008, J. Comput. Biol..

[20]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[21]  Colin N. Dewey,et al.  BUCKy: Gene tree/species tree reconciliation with Bayesian concordance analysis , 2010, Bioinform..

[22]  Tandy Warnow,et al.  ASTRID: Accurate Species TRees from Internode Distances , 2015, bioRxiv.

[23]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[24]  Liang Liu,et al.  Estimating species trees from unrooted gene trees. , 2011, Systematic biology.

[25]  Saravanaraj N. Ayyampalayam,et al.  Phylotranscriptomic analysis of the origin and early diversification of land plants , 2014, Proceedings of the National Academy of Sciences.

[26]  Benjamin D. Redelings,et al.  BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny , 2006, Bioinform..

[27]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[28]  S. Jeffery Evolution of Protein Molecules , 1979 .

[29]  Mike Steel,et al.  Patching Up X-Trees , 1999 .

[30]  V. Moulton,et al.  Neighbor-net: an agglomerative method for the construction of phylogenetic networks. , 2002, Molecular biology and evolution.

[31]  Daniel H. Huson,et al.  Phylogenetic Networks - Concepts, Algorithms and Applications , 2011 .

[32]  Tandy Warnow,et al.  BBCA: Improving the scalability of *BEAST using random binning , 2014, BMC Genomics.

[33]  R. M. Mattheyses,et al.  A Linear-Time Heuristic for Improving Network Partitions , 1982, 19th Design Automation Conference.

[34]  Satish Rao,et al.  Quartets MaxCut: A Divide and Conquer Quartets Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[36]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[37]  Rezwana Reaz,et al.  Accurate Phylogenetic Tree Reconstruction from Quartets: A Heuristic Approach , 2014, PloS one.

[38]  Tandy J. Warnow,et al.  An experimental study of Quartets MaxCut and other supertree methods , 2010, Algorithms for Molecular Biology.

[39]  Katherine L. Thompson,et al.  Using ancestral information to detect and localize quantitative trait loci in genome-wide association studies , 2013, BMC Bioinformatics.

[40]  John Gatesy,et al.  The gene tree delusion. , 2016, Molecular phylogenetics and evolution.

[41]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[42]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[43]  Satish Rao,et al.  Quartet MaxCut: a fast algorithm for amalgamating quartet trees. , 2012, Molecular phylogenetics and evolution.

[44]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[45]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[46]  A. Löytynoja,et al.  Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis , 2008, Science.

[47]  Tandy Warnow,et al.  On the Robustness to Gene Tree Estimation Error (or lack thereof) of Coalescent-Based Species Tree Methods. , 2015, Systematic biology.

[48]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[49]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[50]  Olivier Gascuel,et al.  FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program , 2015, Molecular biology and evolution.

[51]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[52]  John A Rhodes,et al.  Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent , 2009, Journal of mathematical biology.

[53]  Tandy Warnow,et al.  Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer , 2015, bioRxiv.

[54]  Sagi Snir,et al.  Weighted quartets phylogenetics. , 2015, Systematic biology.

[55]  Tandy J. Warnow,et al.  PASTA: Ultra-Large Multiple Sequence Alignment , 2014, RECOMB.