Inferring Species Trees Using Integrative Models of Species Evolution

Bayesian methods can be used to accurately estimate species tree topologies, times and other parameters, but only when the models of evolution which are available and utilized sufficiently account for the underlying evolutionary processes. Multispecies coalescent (MSC) models have been shown to accurately account for the evolution of genes within species in the absence of strong gene flow between lineages, and fossilized birth-death (FBD) models have been shown to estimate divergence times from fossil data in good agreement with expert opinion. Until now dating analyses using the MSC have been based on a fixed clock or informally derived node priors instead of the FBD. On the other hand, dating analyses using an FBD process have concatenated all gene sequences and ignored coalescence processes. To address these mirror-image deficiencies in evolutionary models, we have developed an integrative model of evolution which combines both the FBD and MSC models. By applying concatenation and the MSC (without employing the FBD process) to an exemplar data set consisting of molecular sequence data and morphological characters from the dog and fox subfamily Caninae, we show that concatenation causes predictable biases in estimated branch lengths. We then applied concatenation using the FBD process and the combined FBD-MSC model to show that the same biases are still observed when the FBD process is employed. These biases can be avoided by using the FBD-MSC model, which coherently models fossilization and gene evolution, and does not require an a priori substitution rate estimate to calibrate the molecular clock. We have implemented the FBD-MSC in a new version of StarBEAST2, a package developed for the BEAST2 phylogenetic software.

[1]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[2]  E. Callaway DNA mutation clock proves tough to set , 2015, Nature.

[3]  Andrea L. Cirranello,et al.  The Placental Mammal Ancestor and the Post–K-Pg Radiation of Placentals , 2013, Science.

[4]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[5]  Adoum H. Mahamat,et al.  A new hominid from the Upper Miocene of Chad, Central Africa , 2002, Nature.

[6]  Huw A. Ogilvie,et al.  Computational Performance and Statistical Accuracy of *BEAST and Comparisons with Other Methods , 2015, Systematic biology.

[7]  Matthew W. Hahn,et al.  Gene tree discordance causes apparent substitution rate variation , 2015, bioRxiv.

[8]  L. Bromham The genome as a life-history character: why rate of molecular evolution varies between mammal species , 2011, Philosophical Transactions of the Royal Society B: Biological Sciences.

[9]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[10]  Tanja Stadler,et al.  Bayesian Total-Evidence Dating Reveals the Recent Crown Radiation of Penguins , 2015, Systematic biology.

[11]  Tanja Stadler,et al.  Lineages-through-time plots of neutral models for speciation. , 2008, Mathematical biosciences.

[12]  Adrian W. Briggs,et al.  Individual A High-Coverage Genome Sequence from an Archaic Denisovan , 2012 .

[13]  B. Wood,et al.  The evolutionary context of the first hominins , 2011, Nature.

[14]  Michael S. Y. Lee,et al.  Calibration choice, rate smoothing, and the pattern of tetrapod diversification according to the long nuclear gene RAG-1. , 2007, Systematic biology.

[15]  M. Benton,et al.  Rocks and clocks: calibrating the Tree of Life using fossils and molecules. , 2007, Trends in ecology & evolution.

[16]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[17]  G. Edgecombe,et al.  Rates of Phenotypic and Genomic Evolution during the Cambrian Explosion , 2013, Current Biology.

[18]  F. Ronquist,et al.  Ecology, Evolution and Organismal Biology Publications Ecology, Evolution and Organismal Biology Total-evidence Dating under the Fossilized Birth–death Process , 2022 .

[19]  Albert J. Vilella,et al.  Insights into hominid evolution from the gorilla genome sequence , 2012, Nature.

[20]  Huw A. Ogilvie,et al.  StarBEAST2 Brings Faster Species Tree Inference and Accurate Estimates of Substitution Rates , 2016, bioRxiv.

[21]  A. R. Wagner Molecular Biology and Evolution , 2001 .

[22]  Michael J. Landis,et al.  RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language , 2016, Systematic biology.

[23]  C. Lovejoy,et al.  Neither chimpanzee nor human, Ardipithecus reveals the surprising ancestry of both , 2015, Proceedings of the National Academy of Sciences.

[24]  Jeet Sukumaran,et al.  DendroPy: a Python library for phylogenetic computing , 2010, Bioinform..

[25]  Ari Löytynoja,et al.  An algorithm for progressive multiple alignment of sequences with insertions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Eric S. Lander,et al.  Genetic evidence for complex speciation of humans and chimpanzees , 2006, Nature.

[27]  P. Lewis A likelihood approach to estimating phylogeny from discrete morphological character data. , 2001, Systematic biology.

[28]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[29]  Xiaoming Wang,et al.  Phylogenetic Systematics of the North American Fossil Caninae (Carnivora: Canidae) , 2009 .

[30]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[31]  Tanja Stadler,et al.  Bayesian Inference of Sampled Ancestor Trees for Epidemiology and Fossil Calibration , 2014, PLoS Comput. Biol..

[32]  Vladimir N Minin,et al.  Detecting the Anomaly Zone in Species Trees and Evidence for a Misleading Signal in Higher-Level Skink Phylogeny (Squamata: Scincidae). , 2016, Systematic biology.

[33]  C. Lovejoy,et al.  Ardipithecus ramidus and the Paleobiology of Early Hominids , 2009, Science.

[34]  Peter Crane,et al.  Heterogeneous Rates of Molecular Evolution and Diversification Could Explain the Triassic Age Estimate for Angiosperms Systematic Biology Advance Access Published May 4, 2015 , 2022 .

[35]  W. Maddison Gene Trees in Species Trees , 1997 .

[36]  Tanja Stadler,et al.  Bayesian Inference of Species Networks from Multilocus Sequence Data , 2017, bioRxiv.

[37]  Ziheng Yang The BPP program for species tree estimation and species delimitation , 2015 .

[38]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[39]  M. Phillips,et al.  Comment on “Whole-genome analyses resolve early branches in the tree of life of modern birds” , 2015, Science.

[40]  Md. Shamsuzzoha Bayzid,et al.  Whole-genome analyses resolve early branches in the tree of life of modern birds , 2014, Science.

[41]  G. Slater Iterative adaptive radiations of fossil canids show no evidence for diversity-dependent trait evolution , 2015, Proceedings of the National Academy of Sciences.

[42]  Wang Xiaoming,et al.  Phylogenetic systematics of the Borophaginae (Carnivora, Canidae). Bulletin of the AMNH ; no. 243 , 1999 .

[43]  J. Zrzavý,et al.  Phylogeny of Recent Canidae (Mammalia, Carnivora): relative reliability and utility of morphological and molecular datasets , 2004 .

[44]  Luay Nakhleh,et al.  Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data , 2017, bioRxiv.

[45]  R. Wayne,et al.  A molecular phylogeny of the Canidae based on six nuclear loci. , 2005, Molecular phylogenetics and evolution.

[46]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[47]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[48]  F. Prevosti Phylogeny of the large extinct South American Canids (Mammalia, Carnivora, Canidae) using a “total evidence” approach , 2010, Cladistics : the international journal of the Willi Hennig Society.

[49]  Scott V Edwards,et al.  Estimating phylogenetic trees from genome‐scale data , 2015, Annals of the New York Academy of Sciences.

[50]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[51]  S. O’Brien,et al.  Genome-wide Evidence Reveals that African and Eurasian Golden Jackals Are Distinct Species , 2015, Current Biology.

[52]  R. Tedford,et al.  PHYLOGENETIC SYSTEMATICSOF THE BOROPHAGINAE(CARNIVORA: CANIDAE) , 1999 .

[53]  S. Bonhoeffer,et al.  Birth–death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV) , 2012, Proceedings of the National Academy of Sciences.

[54]  Xiao-Mei Wang Phylogenetic systematics of the Hesperocyoninae (Carnivora, Canidae). Bulletin of the AMNH ; no. 221 , 1994 .

[55]  David K. Smith,et al.  ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data , 2017 .

[56]  James A. Cuff,et al.  Genome sequence, comparative analysis and haplotype structure of the domestic dog , 2005, Nature.

[57]  Kevin E. Langergraber,et al.  Generation times in wild chimpanzees and gorillas suggest earlier divergence times in great ape and human evolution , 2012, Proceedings of the National Academy of Sciences.

[58]  T. Stadler Sampling-through-time in birth-death trees. , 2010, Journal of theoretical biology.

[59]  Liang Liu,et al.  BEST: Bayesian estimation of species trees under the coalescent model , 2008, Bioinform..

[60]  Ziheng Yang,et al.  Neither phylogenomic nor palaeontological data support a Palaeogene origin of placental mammals , 2014, Biology Letters.

[61]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[62]  Seraina Klopfstein,et al.  A Total-Evidence Approach to Dating with Fossils, Applied to the Early Radiation of the Hymenoptera , 2012, Systematic biology.

[63]  B. Rannala,et al.  Efficient Bayesian Species Tree Inference under the Multispecies Coalescent , 2015, Systematic biology.

[64]  J. Huelsenbeck,et al.  Bayesian phylogenetic analysis of combined data. , 2004, Systematic biology.

[65]  R. Durbin,et al.  Revising the human mutation rate: implications for understanding human evolution , 2012, Nature Reviews Genetics.

[66]  J. Huelsenbeck,et al.  The fossilized birth–death process for coherent calibration of divergence-time estimates , 2013, Proceedings of the National Academy of Sciences.

[67]  R. Wayne,et al.  Isolation and molecular evolution of the selenocysteine tRNA (Cf TRSP) and RNase P RNA (Cf RPPH1) genes in the dog family, Canidae. , 2005, Molecular biology and evolution.

[68]  Matthew W. Hahn,et al.  Why Concatenation Fails Near the Anomaly Zone , 2018, Systematic biology.

[69]  Bonnie Berger,et al.  Calibrating the Human Mutation Rate via Ancestral Recombination Density in Diploid Genomes , 2015, bioRxiv.

[70]  Randle Aaron M. Villanueva,et al.  ggplot2: Elegant Graphics for Data Analysis (2nd ed.) , 2019, Measurement: Interdisciplinary Research and Perspectives.

[71]  Adrian W. Briggs,et al.  A High-Coverage Genome Sequence from an Archaic Denisovan Individual , 2012, Science.