New Methods to Calculate Concordance Factors for Phylogenomic Datasets

We implement two measures for quantifying genealogical concordance in phylogenomic datasets: the gene concordance factor (gCF) and the novel site concordance factor (sCF). For every branch of a reference tree, gCF is defined as the percentage of “decisive” gene trees containing that branch. This measure is already in wide usage, but here we introduce a package that calculates it while accounting for variable taxon coverage among gene trees. sCF is a new measure defined as the percentage of decisive sites supporting a branch in the reference tree. gCF and sCF complement classical measures of branch support in phylogenetics by providing a full description of underlying disagreement among loci and sites. An easy to use implementation and tutorial is freely available in the IQ-TREE software package (http://www.iqtree.org).

[1]  Alexandros Stamatakis,et al.  Novel information theory-based measures for quantifying incongruence among phylogenetic trees. , 2014, Molecular biology and evolution.

[2]  New Methods to Calculate Concordance Factors for Phylogenomic Datasets , 2020, Molecular biology and evolution.

[3]  B. Larget,et al.  Bayesian estimation of concordance among gene trees. , 2006, Molecular biology and evolution.

[4]  S. Edwards,et al.  Genome-scale DNA sequence data and the evolutionary history of placental mammals , 2018, Data in brief.

[5]  Michael A. Charleston,et al.  Spectrum: spectral analysis of phylogenetic data , 1998, Bioinform..

[6]  H. Philippe,et al.  Impact of missing data on phylogenies inferred from empirical phylogenomic data sets. , 2013, Molecular biology and evolution.

[7]  Thomas K. F. Wong,et al.  Phylogenomics resolves the timing and pattern of insect evolution , 2014, Science.

[8]  T. Warnow,et al.  Phylogenomic analyses data of the avian phylogenomics project , 2015, GigaScience.

[9]  Seán G. Brady,et al.  Phylogenomic Insights into the Evolution of Stinging Wasps and the Origins of Ants and Bees , 2017, Current Biology.

[10]  Jin-Hua Ran,et al.  Phylogenomics resolves the deep phylogeny of seed plants and indicates partial convergent or homoplastic evolution between Gnetales and angiosperms , 2018, Proceedings of the Royal Society B: Biological Sciences.

[11]  Minh Anh Nguyen,et al.  Ultrafast Approximation for Phylogenetic Bootstrap , 2013, Molecular biology and evolution.

[12]  Cody E. Hinchliff,et al.  Quartet Sampling distinguishes lack of support from conflicting support in the green plant tree of life. , 2018, American journal of botany.

[13]  A. Stamatakis,et al.  Computing the Internode Certainty and Related Measures from Partial Gene Trees , 2015, bioRxiv.

[14]  Matthew W. Hahn,et al.  Why Concatenation Fails Near the Anomaly Zone , 2018, Systematic biology.

[15]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[16]  Robert G. Easterling,et al.  Statistics and Truth , 1999, Technometrics.

[17]  F. Ronquist,et al.  Xenacoelomorpha is the sister group to Nephrozoa , 2016, Nature.

[18]  Sudhindra R Gadagkar,et al.  Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. , 2005, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[19]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[20]  K. Strimmer,et al.  Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Jesús A. Ballesteros,et al.  A Critical Appraisal of the Placement of Xiphosura (Chelicerata) with Account of Known Sources of Phylogenetic Error. , 2019, Systematic biology.

[22]  A. Rokas,et al.  Contentious relationships in phylogenomic studies can be driven by a handful of genes , 2017, Nature Ecology &Evolution.

[23]  Olga Chernomor,et al.  IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era , 2020, Molecular biology and evolution.

[24]  A. von Haeseler,et al.  UFBoot2: Improving the Ultrafast Bootstrap Approximation , 2017, bioRxiv.

[25]  S. Perkins,et al.  Multiple origins of green blood in New Guinea lizards , 2018, Science Advances.

[26]  Sergei L. Kosakovsky Pond,et al.  Statistics and truth in phylogenomics. , 2012, Molecular biology and evolution.

[27]  Liang Liu,et al.  The Impact of Missing Data on Species Tree Estimation. , 2016, Molecular biology and evolution.

[28]  Mike Steel,et al.  Terraces in Phylogenetic Tree Space , 2011, Science.

[29]  D. Baum Concordance trees, concordance factors, and the exploration of reticulate genealogy , 2007 .

[30]  Stephen A. Smith,et al.  Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants , 2015, BMC Evolutionary Biology.

[31]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[32]  Antonis Rokas,et al.  Inferring ancient divergences requires genes with strong phylogenetic signals , 2013, Nature.

[33]  Siavash Mirarab,et al.  Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies , 2016, Molecular biology and evolution.

[34]  Alexandros Stamatakis,et al.  Decisive Data Sets in Phylogenomics: Lessons from Studies on the Phylogenetic Relationships of Primarily Wingless Insects , 2013, Molecular biology and evolution.

[35]  D. Penny,et al.  Spectral analysis of phylogenetic data , 1993 .

[36]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.