SVDquest: Improving SVDquartets species tree estimation using exact optimization within a constrained search space.

Species tree estimation from multi-locus datasets is complicated by processes such as incomplete lineage sorting (ILS) that result in different loci having different trees. Summary methods, which estimate species trees by combining gene trees, are popular but their accuracy is impaired by gene tree estimation error. Other approaches have been developed that only use the site patterns to estimate the species tree, and so are not impacted by gene tree estimation issues. In particular, PAUP∗ provides a method in which SVDquartets is used to compute a set Q of quartet trees (i.e., trees on four leaves), and then a heuristic search is used to combine the quartet trees into a species tree T, seeking to maximize the number of quartet trees in Q that agree with T. The PAUP∗ method based on SVDquartets (henceforth referred to as SVDquartets + PAUP∗) is increasingly used in phylogenomic studies due to its ability to reconstruct species trees without needing to estimate accurate gene trees. We present SVDquest∗, a new method for constructing species trees using site patterns that is guaranteed to produce species trees that satisfy at least as many quartet trees as SVDquartets + PAUP∗. We show that SVDquest∗ is competitive with ASTRAL and ASTRID (two leading summary methods) in terms of topological accuracy, and tends to be more accurate than ASTRAL and ASTRID under conditions with relatively high gene tree estimation error. SVDquest∗ is available in open source form at https://github.com/pranjalv123/SVDquest.

[1]  Md. Shamsuzzoha Bayzid,et al.  Whole-genome analyses resolve early branches in the tree of life of modern birds , 2014, Science.

[2]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[3]  Elchanan Mossel,et al.  Distance-based species tree estimation under the coalescent: Information-theoretic trade-off between number of loci and sequence length , 2017 .

[4]  Tandy Warnow,et al.  BBCA: Improving the scalability of *BEAST using random binning , 2014, BMC Genomics.

[5]  Travis C. Glenn,et al.  A Phylogeny of Birds Based on Over 1,500 Loci Collected by Target Enrichment and High-Throughput Sequencing , 2012, PloS one.

[6]  David Bryant,et al.  Next-generation sequencing reveals phylogeographic structure and a species tree for recent bird divergences. , 2009, Molecular phylogenetics and evolution.

[7]  Laura Salter Kubatko,et al.  Quartet Inference from SNP Data Under the Coalescent Model , 2014, Bioinform..

[8]  Md. Shamsuzzoha Bayzid,et al.  Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses , 2014, PloS one.

[9]  W. Maddison Gene Trees in Species Trees , 1997 .

[10]  E. Braun,et al.  Rapid and recent diversification of curassows, guans, and chachalacas (Galliformes: Cracidae) out of Mesoamerica: Phylogeny inferred from mitochondrial, intron, and ultraconserved element sequences. , 2016, Molecular phylogenetics and evolution.

[11]  M. Springer,et al.  Talpid Mole Phylogeny Unites Shrew Moles and Illuminates Overlooked Cryptic Species Diversity , 2017, Molecular biology and evolution.

[12]  Nicola De Maio,et al.  Reversible polymorphism-aware phylogenetic models and their application to tree inference. , 2016, Journal of theoretical biology.

[13]  Scott V Edwards,et al.  A maximum pseudo-likelihood approach for estimating species trees under the coalescent model , 2010, BMC Evolutionary Biology.

[14]  K. Burns,et al.  Comparison of Target-Capture and Restriction-Site Associated DNA Sequencing for Phylogenomics: A Test in Cardinalid Tanagers (Aves, Genus: Piranga). , 2016, Systematic biology.

[15]  Nora Mitchell,et al.  Anchored phylogenomics improves the resolution of evolutionary relationships in the rapid radiation of Protea L. , 2017, American journal of botany.

[16]  Robert D. Nowak,et al.  Data Requirement for Phylogenetic Inference from Multiple Loci: A New Distance Method , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  E. Conti,et al.  Sequence capture using RAD probes clarifies phylogenetic relationships and species boundaries in Primula sect. Auricula. , 2016, Molecular phylogenetics and evolution.

[18]  Tandy J. Warnow,et al.  Enhancing Searches for Optimal Trees Using SIESTA , 2017, RECOMB-CG.

[19]  R. Moyle,et al.  Relationships of morphological groups in the northern flicker superspecies complex (Colaptes auratus & C. chrysoides) , 2017 .

[20]  B. Faircloth,et al.  Tectonic collision and uplift of Wallacea triggered the global songbird radiation , 2016, Nature Communications.

[21]  L. Muggia,et al.  Resolving evolutionary relationships in lichen-forming fungi using diverse phylogenomic datasets and analytical approaches , 2016, Scientific Reports.

[22]  C. Woese On the evolution of cells , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[23]  John Gatesy,et al.  The gene tree delusion. , 2016, Molecular phylogenetics and evolution.

[24]  Tandy J. Warnow,et al.  FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization , 2016, Bioinform..

[25]  L Lacey Knowles,et al.  Estimating species trees: methods of phylogenetic analysis when there is incongruence across genes. , 2009, Systematic biology.

[26]  Michael G. Nute,et al.  A comparative study of SVDquartets and other coalescent-based species tree estimation methods , 2015, BMC Genomics.

[27]  M. Steel,et al.  Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. , 2015, Theoretical population biology.

[28]  J. Degnan,et al.  Fast and consistent estimation of species trees using supermatrix rooted triples. , 2010, Molecular biology and evolution.

[29]  The influence of locus number and information content on species delimitation: an empirical test case in an endangered Mexican salamander , 2016, Molecular ecology.

[30]  Liang Liu,et al.  Estimating species trees from unrooted gene trees. , 2011, Systematic biology.

[31]  Sen Song,et al.  Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model , 2012, Proceedings of the National Academy of Sciences.

[32]  Saravanaraj N. Ayyampalayam,et al.  Phylotranscriptomic analysis of the origin and early diversification of land plants , 2014, Proceedings of the National Academy of Sciences.

[33]  Edward L. Braun,et al.  Error in Phylogenetic Estimation for Bushes in the Tree of Life , 2013 .

[34]  John A Rhodes,et al.  Split Probabilities and Species Tree Inference Under the Multispecies Coalescent Model , 2017, Bulletin of mathematical biology.

[35]  Andrew A. Crowl,et al.  Embracing discordance: Phylogenomic analyses provide evidence for allopolyploidy leading to cryptic diversity in a Mediterranean Campanula (Campanulaceae) clade , 2017, Evolution; international journal of organic evolution.

[36]  Xuming Zhou,et al.  Phylogenomic Analysis Resolves the Interordinal Relationships and Rapid Diversification of the Laurasiatherian Mammals , 2011, Systematic biology.

[37]  Hayley C. Lanier,et al.  Is recombination a problem for species-tree analyses? , 2012, Systematic biology.

[38]  Jeet Sukumaran,et al.  DendroPy: a Python library for phylogenetic computing , 2010, Bioinform..

[39]  Tandy Warnow,et al.  To include or not to include: The impact of gene filtering on species tree estimation methods , 2017, bioRxiv.

[40]  B. Faircloth,et al.  Analysis of a Rapid Evolutionary Radiation Using Ultraconserved Elements: Evidence for a Bias in Some Multispecies Coalescent Methods. , 2016, Systematic biology.

[41]  Rafe M. Brown,et al.  Target journal: Evolution Title: Genomic data reveals potential for hybridization, introgression, and incomplete lineage sorting to confound phylogenetic relationships in an adaptive radiation of narrow-mouth frogs , 2016 .

[42]  R. C. Thomson,et al.  Phylogenomics and species delimitation in the knob-scaled lizards of the genus Xenosaurus (Squamata: Xenosauridae) using ddRADseq data reveal a substantial underestimation of diversity. , 2017, Molecular phylogenetics and evolution.

[43]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[44]  Elchanan Mossel,et al.  Distance-based Species Tree Estimation: Information-Theoretic Trade-off between Number of Loci and Sequence Length under the Coalescent , 2015, APPROX-RANDOM.

[45]  Chao Zhang,et al.  ASTRAL-III: Increased Scalability and Impacts of Contracting Low Support Branches , 2017, RECOMB-CG.

[46]  Tao Jiang,et al.  A Polynomial Time Approximation Scheme for Inferring Evolutionary Trees from Quartet Topologies and Its Application , 2001, SIAM J. Comput..

[47]  Tandy J. Warnow,et al.  ASTRAL: genome-scale coalescent-based species tree estimation , 2014, Bioinform..

[48]  M. Barrett,et al.  Genotyping-by-Sequencing in a Species Complex of Australian Hummock Grasses (Triodia): Methodological Insights and Phylogenetic Resolution , 2017, PloS one.

[49]  Adam D. Leaché,et al.  Phylogenomics of Phrynosomatid Lizards: Conflicting Signals from Sequence Capture versus Restriction Site Associated DNA Sequencing , 2015, Genome biology and evolution.

[50]  Siavash Mirarab,et al.  Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies , 2016, Molecular biology and evolution.

[51]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[52]  Michael DeGiorgio,et al.  Robustness to divergence time underestimation when inferring species trees from estimated gene trees. , 2014, Systematic biology.

[53]  David Posada,et al.  SimPhy: Phylogenomic Simulation of Gene, Locus, and Species Trees , 2015, bioRxiv.

[54]  Mike A. Steel,et al.  Constructing Optimal Trees from Quartets , 2001, J. Algorithms.

[55]  Nicola De Maio,et al.  PoMo: An Allele Frequency-Based Approach for Species Tree Estimation , 2015, bioRxiv.

[56]  Tandy Warnow,et al.  ASTRID: Accurate Species TRees from Internode Distances , 2015, bioRxiv.

[57]  S. Ohno Evolution by Gene Duplication , 1971 .

[58]  Tandy J. Warnow,et al.  ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes , 2015, Bioinform..

[59]  Tandy J. Warnow,et al.  Naive binning improves phylogenomic analyses , 2013, Bioinform..

[60]  Tandy J. Warnow,et al.  An experimental study of Quartets MaxCut and other supertree methods , 2010, Algorithms for Molecular Biology.

[61]  Qixin He,et al.  Sources of error inherent in species-tree estimation: impact of mutational and coalescent effects on accuracy and implications for choosing among different methods. , 2010, Systematic biology.

[62]  D. Wildman,et al.  Phylogeny of the Ferungulata (Mammalia: Laurasiatheria) as determined from phylogenomic data. , 2009, Molecular Phylogenetics and Evolution.

[63]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[64]  Laura Kubatko,et al.  Hypothesis tests for phylogenetic quartets, with applications to coalescent-based species tree inference. , 2016, Journal of theoretical biology.

[65]  John Gatesy,et al.  Phylogenetic analysis at deep timescales: unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum. , 2014, Molecular phylogenetics and evolution.

[66]  Tandy Warnow,et al.  Evaluating Summary Methods for Multilocus Species Tree Estimation in the Presence of Incomplete Lineage Sorting. , 2016, Systematic biology.

[67]  Tandy Warnow,et al.  On the Robustness to Gene Tree Estimation Error (or lack thereof) of Coalescent-Based Species Tree Methods. , 2015, Systematic biology.

[68]  David Posada,et al.  Multilocus inference of species trees and DNA barcoding , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[69]  Serita M. Nelesen,et al.  Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees , 2009, Science.

[70]  Liang Liu,et al.  BEST: Bayesian estimation of species trees under the coalescent model , 2008, Bioinform..