Variation Across Mitochondrial Gene Trees Provides Evidence for Systematic Error: How Much Gene Tree Variation Is Biological?

Abstract.— The use of large genomic data sets in phylogenetics has highlighted extensive topological variation across genes. Much of this discordance is assumed to result from biological processes. However, variation among gene trees can also be a consequence of systematic error driven by poor model fit, and the relative importance of biological vs. methodological factors in explaining gene tree variation is a major unresolved question. Using mitochondrial genomes to control for biological causes of gene tree variation, we estimate the extent of gene tree discordance driven by systematic error and employ posterior prediction to highlight the role of model fit in producing this discordance. We find that the amount of discordance among mitochondrial gene trees is similar to the amount of discordance found in other studies that assume only biological causes of variation. This similarity suggests that the role of systematic error in generating gene tree variation is underappreciated and critical evaluation of fit between assumed models and the data used for inference is important for the resolution of unresolved phylogenetic questions.

[1]  Jeremy M. Brown,et al.  P3: Phylogenetic Posterior Prediction in RevBayes , 2018, Molecular biology and evolution.

[2]  Chao Zhang,et al.  ASTRAL-III: Increased Scalability and Impacts of Contracting Low Support Branches , 2017, RECOMB-CG.

[3]  H. Shaffer,et al.  Phylogenomic analyses of 539 highly informative loci dates a fully resolved time tree for the major clades of living turtles (Testudines). , 2017, Molecular phylogenetics and evolution.

[4]  Tandy Warnow,et al.  To include or not to include: The impact of gene filtering on species tree estimation methods , 2017, bioRxiv.

[5]  S. Ho,et al.  New Statistical Criteria Detect Phylogenetic Bias Caused by Compositional Heterogeneity , 2017, Molecular biology and evolution.

[6]  Anthony J. Geneva,et al.  RWTY (R We There Yet): An R Package for Examining Convergence of Bayesian Phylogenetic Analyses. , 2017, Molecular biology and evolution.

[7]  A. Rokas,et al.  Contentious relationships in phylogenomic studies can be driven by a handful of genes , 2017, Nature Ecology &Evolution.

[8]  Wen Huang,et al.  Visualizing phylogenetic tree landscapes , 2017, BMC Bioinformatics.

[9]  Jeremy M. Brown,et al.  Bayes Factors Unmask Highly Variable Information Content, Bias, and Extreme Influence in Phylogenomic Analyses , 2016, Systematic biology.

[10]  Jeremy M. Brown,et al.  TreeScaper: Visualizing and Extracting Phylogenetic Signal from Sets of Trees. , 2016, Molecular biology and evolution.

[11]  Nicolas Galtier,et al.  Incomplete Lineage Sorting in Mammalian Phylogenomics , 2016, Systematic biology.

[12]  Jeffrey P. Townsend,et al.  A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing , 2016, Nature.

[13]  R. C. Thomson,et al.  Assessing the performance of DNA barcoding using posterior predictive simulations , 2016, Molecular ecology.

[14]  Tandy Warnow,et al.  Evaluating Summary Methods for Multilocus Species Tree Estimation in the Presence of Incomplete Lineage Sorting. , 2016, Systematic biology.

[15]  Matthew W. Hahn,et al.  Phylogenomics Reveals Three Sources of Adaptive Variation during a Rapid Radiation , 2016, PLoS biology.

[16]  Bryan C. Carstens,et al.  Posterior predictive checks of coalescent models: P2C2M, an R package , 2016, Molecular ecology resources.

[17]  Jeremy M. Brown,et al.  Can We Identify Genes with Increased Phylogenetic Reliability? , 2015, Systematic biology.

[18]  Edward C Holmes,et al.  Evaluating the Adequacy of Molecular Clock Models Using Posterior Predictive Simulations , 2015, Molecular biology and evolution.

[19]  Tandy J. Warnow,et al.  ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes , 2015, Bioinform..

[20]  L. Moroz,et al.  Error, signal, and the placement of Ctenophora sister to all other animals , 2015, Proceedings of the National Academy of Sciences.

[21]  Nicholas G. Crawford,et al.  A phylogenomic analysis of turtles. , 2015, Molecular phylogenetics and evolution.

[22]  B. Faircloth,et al.  Incongruence among different mitochondrial regions: a case study using complete mitogenomes. , 2014, Molecular phylogenetics and evolution.

[23]  Jeremy M. Brown,et al.  Detection of implausible phylogenetic inferences using posterior predictive assessment of model fit. , 2014, Systematic biology.

[24]  Matthew W Pennell,et al.  Robust regression and posterior predictive simulation increase power to detect early bursts of trait evolution. , 2014, Systematic Biology.

[25]  Jeremy M. Brown,et al.  Poor fit to the multispecies coalescent is widely detectable in empirical data. , 2014, Systematic biology.

[26]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[27]  L. Nakhleh,et al.  Computational approaches to species phylogeny inference and gene tree reconciliation. , 2013, Trends in ecology & evolution.

[28]  É. Tannier,et al.  The Inference of Gene Trees with Species Trees , 2013, Systematic biology.

[29]  Zhen Yan,et al.  Origin of land plants using the multispecies coalescent model. , 2013, Trends in plant science.

[30]  Bret Larget,et al.  The estimation of tree posterior probabilities using conditional clade probability distributions. , 2013, Systematic biology.

[31]  B. Boussau,et al.  Efficient Exploration of the Space of Reconciled Gene Trees , 2013, Systematic biology.

[32]  Antonis Rokas,et al.  Inferring ancient divergences requires genes with strong phylogenetic signals , 2013, Nature.

[33]  M. Gouy,et al.  Genome-scale coestimation of species and gene trees , 2013, Genome research.

[34]  T. Townsend,et al.  Resolving the phylogeny of lizards and snakes (Squamata) with extensive sampling of genes and species , 2012, Biology Letters.

[35]  Sen Song,et al.  Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model , 2012, Proceedings of the National Academy of Sciences.

[36]  Ramón Doallo,et al.  CircadiOmics: integrating circadian genomics, transcriptomics, proteomics and metabolomics , 2012, Nature Methods.

[37]  Shane S. Sturrock,et al.  Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data , 2012, Bioinform..

[38]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[39]  Sergei L. Kosakovsky Pond,et al.  Statistics and truth in phylogenomics. , 2012, Molecular biology and evolution.

[40]  A. Pyron,et al.  A large-scale phylogeny of Amphibia including over 2800 species, and a revised classification of extant frogs, salamanders, and caecilians. , 2011, Molecular phylogenetics and evolution.

[41]  J. Oaks A TIME‐CALIBRATED SPECIES TREE OF CROCODYLIA REVEALS A RECENT RADIATION OF THE TRUE CROCODILES , 2011, Evolution; international journal of organic evolution.

[42]  T. J. Robinson,et al.  Impacts of the Cretaceous Terrestrial Revolution and KPg Extinction on Mammal Diversification , 2011, Science.

[43]  H. Philippe,et al.  Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough , 2011, PLoS biology.

[44]  Jack Sullivan,et al.  Assessment of substitution model adequacy using frequentist and Bayesian methods. , 2010, Molecular biology and evolution.

[45]  Qixin He,et al.  Sources of error inherent in species-tree estimation: impact of mutational and coalescent effects on accuracy and implications for choosing among different methods. , 2010, Systematic biology.

[46]  Jeet Sukumaran,et al.  DendroPy: a Python library for phylogenetic computing , 2010, Bioinform..

[47]  H. Shaffer,et al.  Fourteen nuclear genes provide phylogenetic resolution for difficult nodes in the turtle tree of life. , 2010, Molecular phylogenetics and evolution.

[48]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[49]  Nicolas Lartillot,et al.  PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating , 2009, Bioinform..

[50]  Hervé Philippe,et al.  Computational methods for evaluating phylogenetic models of coding sequence evolution with dependence between codons. , 2009, Molecular biology and evolution.

[51]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[52]  Corinne Da Silva,et al.  Phylogenomics Revives Traditional Views on Deep Animal Relationships , 2009, Current Biology.

[53]  Jeremy M. Brown,et al.  PuMA: Bayesian analysis of partitioned (and unpartitioned) model adequacy , 2009, Bioinform..

[54]  B. Schierwater,et al.  Concatenated Analysis Sheds Light on Early Metazoan Evolution and Fuels a Modern “Urmetazoon” Hypothesis , 2009, PLoS biology.

[55]  S. Edwards IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING? , 2009, Evolution; international journal of organic evolution.

[56]  Daniel J. White,et al.  Revealing the hidden complexities of mtDNA inheritance , 2008, Molecular ecology.

[57]  David Q. Matus,et al.  Broad phylogenomic sampling improves resolution of the animal tree of life , 2008, Nature.

[58]  M. Telford Phylogenomics , 2007, Current Biology.

[59]  Jeremy M. Brown,et al.  The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics. , 2007, Systematic biology.

[60]  F. Delsuc,et al.  Phylogenomics: the beginning of incongruence? , 2006, Trends in genetics : TIG.

[61]  S. Carroll,et al.  More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. , 2005, Molecular biology and evolution.

[62]  F. Delsuc,et al.  Phylogenomics and the reconstruction of the tree of life , 2005, Nature Reviews Genetics.

[63]  B. Rannala,et al.  Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. , 2004, Systematic biology.

[64]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[65]  D. Penny,et al.  Genome-scale phylogeny and the detection of systematic biases. , 2004, Molecular biology and evolution.

[66]  Peter G Foster,et al.  Modeling compositional heterogeneity. , 2004, Systematic biology.

[67]  Emily C. Moriarty,et al.  The importance of proper model assumption in bayesian phylogenetics. , 2004, Systematic biology.

[68]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[69]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[70]  H. Gee Evolution: Ending incongruence , 2003, Nature.

[71]  R. Nielsen Mapping mutations on phylogenies. , 2002, Systematic biology.

[72]  Jonathan P. Bollback,et al.  Bayesian model adequacy and choice in phylogenetics. , 2002, Molecular biology and evolution.

[73]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[74]  J. S. Rogers,et al.  Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. , 2001, Systematic biology.

[75]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[76]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[77]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[78]  J. Huelsenbeck,et al.  SUCCESS OF PHYLOGENETIC METHODS IN THE FOUR-TAXON CASE , 1993 .

[79]  Nick Goldman,et al.  Statistical tests of models of DNA substitution , 1993, Journal of Molecular Evolution.

[80]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[81]  H. Akaike A new look at the statistical model identification , 1974 .

[82]  Scott V Edwards,et al.  Implementing and testing the multispecies coalescent model: A valuable paradigm for phylogenomics. , 2016, Molecular phylogenetics and evolution.

[83]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[84]  Alexei J Drummond,et al.  Guided tree topology proposals for Bayesian phylogenetic inference. , 2012, Systematic biology.

[85]  Robert C Thomson,et al.  Sparse supermatrices for phylogenetic inference: taxonomy, alignment, rogue taxa, and the phylogeny of living turtles. , 2010, Systematic biology.

[86]  Jonathan P. Bollback,et al.  Posterior Mapping and Posterior Predictive Distributions , 2005 .