BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis

Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release. Author summary Bayesian phylogenetic inference methods have undergone considerable development in recent years, and joint modelling of rich evolutionary data, including genomes, phenotypes and fossil occurrences is increasingly common. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing scientific software is increasingly crucial to advancement in many fields of biology. The challenges range from practical software development and engineering, distributed team coordination, conceptual development and statistical modelling, to validation and testing. BEAST 2 is one such computational software platform for phylogenetics, population genetics and phylodynamics, and was first announced over 4 years ago. Here we describe the full range of new tools and models available on the BEAST 2.5 platform, which expand joint evolutionary inference in many new directions, especially for joint inference over multiple data types, non-tree models and complex phylodynamics.

[1]  W. Team,et al.  West African Ebola Epidemic after One Year — Slowing but Not Yet under Control , 2015 .

[2]  M. Suchard,et al.  Phylogeography takes a relaxed random walk in continuous space and time. , 2010, Molecular biology and evolution.

[3]  K. Jakobsen,et al.  Genomics of speciation and introgression in Princess cichlid fishes from Lake Tanganyika , 2016, Molecular ecology.

[4]  Remco R. Bouckaert,et al.  Pseudo Dollo models for the evolution of binary characters along a tree , 2017, bioRxiv.

[5]  Tanja Stadler,et al.  Bayesian Inference of Sampled Ancestor Trees for Epidemiology and Fossil Calibration , 2014, PLoS Comput. Biol..

[6]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[7]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[8]  David Bryant,et al.  Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. , 2009, Molecular biology and evolution.

[9]  Tanja Stadler,et al.  Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth–death SIR model , 2013, Journal of The Royal Society Interface.

[10]  Luay Nakhleh,et al.  Bayesian Inference of Reticulate Phylogenies under the Multispecies Network Coalescent , 2016, PLoS genetics.

[11]  Ming-Hui Chen,et al.  Choosing among Partition Models in Bayesian Phylogenetics , 2010, Molecular biology and evolution.

[12]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[13]  Simon J. Greenhill,et al.  The Austronesian Basic Vocabulary Database: From Bioinformatics to Lexomics , 2008, Evolutionary bioinformatics online.

[14]  Simon J. Greenhill,et al.  Mapping the Origins and Expansion of the Indo-European Language Family , 2012, Science.

[15]  W. Team,et al.  After Ebola in West Africa — Unpredictable Risks, Preventable Epidemics , 2016 .

[16]  Nicola De Maio,et al.  Bayesian reconstruction of transmission within outbreaks using genomic variants , 2017, bioRxiv.

[17]  Tanja Stadler,et al.  Inferring Species Trees Using Integrative Models of Species Evolution , 2018, bioRxiv.

[18]  Alexei J. Drummond,et al.  Bayesian Phylogeography Finds Its Roots , 2009, PLoS Comput. Biol..

[19]  Ziheng Yang,et al.  Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene Loci and calibration points, with application to a radiation of cute-looking mouse lemur species. , 2003, Systematic biology.

[20]  F. Ronquist,et al.  Ecology, Evolution and Organismal Biology Publications Ecology, Evolution and Organismal Biology Total-evidence Dating under the Fossilized Birth–death Process , 2022 .

[21]  Neil Ferguson,et al.  Infectious disease: Tough choices to reduce Ebola transmission , 2014, Nature.

[22]  R. Bouckaert,et al.  Model Selection and Parameter Inference in Phylogenetics Using Nested Sampling , 2017, Systematic biology.

[23]  Julien Y. Dutheil,et al.  Inference of recombination maps from a single pair of genomes and its application to archaic samples , 2018, bioRxiv.

[24]  Matthew W. Hahn,et al.  Why Concatenation Fails Near the Anomaly Zone , 2018, Systematic biology.

[25]  Tanja Stadler,et al.  The Structured Coalescent and Its Approximations , 2016, bioRxiv.

[26]  Alexei J. Drummond,et al.  Calibrated Tree Priors for Relaxed Phylogenetics and Divergence Time Estimation , 2011, Systematic biology.

[27]  Marc A Suchard,et al.  Unifying vertical and nonvertical evolution: a stochastic ARG-based framework. , 2010, Systematic biology.

[28]  Aaron E. Darling,et al.  Local and relaxed clocks: the best of both worlds , 2018, PeerJ.

[29]  David Welch,et al.  Efficient Bayesian inference under the structured coalescent , 2014, Bioinform..

[30]  R. Durbin,et al.  Inference of human population history from individual whole-genome sequences. , 2011, Nature.

[31]  G. Yule,et al.  A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[32]  Nicholas J. Matzke,et al.  Ground truthing tip-dating methods using fossil Canidae reveals major differences in performance , 2016 .

[33]  M. Steel,et al.  Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. , 2015, Theoretical population biology.

[34]  Michael J. Landis,et al.  RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language , 2016, Systematic biology.

[35]  Beda Joos,et al.  Estimating the basic reproductive number from viral sequence data. , 2012, Molecular biology and evolution.

[36]  Tanja Stadler,et al.  Phylodynamics with Migration: A Computational Framework to Quantify Population Structure from Genomic Data , 2016, Molecular biology and evolution.

[37]  Quentin D Atkinson,et al.  The origin and expansion of Pama–Nyungan languages across Australia , 2018, Nature Ecology & Evolution.

[38]  S. Sampling theory for neutral alleles in a varying environment , 2003 .

[39]  Nicola De Maio,et al.  PoMo: An Allele Frequency-Based Approach for Species Tree Estimation , 2015, bioRxiv.

[40]  Seraina Klopfstein,et al.  A Total-Evidence Approach to Dating with Fossils, Applied to the Early Radiation of the Hymenoptera , 2012, Systematic biology.

[41]  M. Suchard,et al.  Bayesian random local clocks, or one rate to rule them all , 2010, BMC Biology.

[42]  P. A. P. Moran,et al.  Random processes in genetics , 1958, Mathematical Proceedings of the Cambridge Philosophical Society.

[43]  Nicola De Maio,et al.  New Routes to Phylogeography: A Bayesian Structured Coalescent Approximation , 2015, PLoS genetics.

[44]  A. Pyron,et al.  Divergence time estimation using fossils as terminal taxa and the origins of Lissamphibia. , 2011, Systematic biology.

[45]  Ziheng Yang,et al.  The influence of gene flow on species tree estimation: a simulation study. , 2014, Systematic biology.

[46]  B. Rannala,et al.  Bayesian inference of fine-scale recombination rates using population genomic data , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[47]  Tanja Stadler,et al.  Inference of species histories in the presence of gene flow , 2018, bioRxiv.

[48]  Daniel L. Ayres,et al.  Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 , 2018, Virus evolution.

[49]  A. Drummond,et al.  Inferring Ancestral Recombination Graphs from Bacterial Genomic Data , 2016, Genetics.

[50]  T. Stadler On incomplete sampling under birth-death models and connections to the sampling-based coalescent. , 2009, Journal of theoretical biology.

[51]  Huw A. Ogilvie,et al.  StarBEAST2 brings faster species tree inference and accurate estimates of substitution rates , 2016 .

[52]  Remco R. Bouckaert,et al.  Bayesian Evolutionary Analysis with BEAST , 2015 .

[53]  Remco Bouckaert,et al.  Phylogeography by diffusion on a sphere: whole world phylogeography , 2016, PeerJ.

[54]  Eric S. Lander,et al.  The genomic substrate for adaptive radiation in African cichlid fish , 2014, Nature.

[55]  Jonathan P. Bollback,et al.  Bayesian model adequacy and choice in phylogenetics. , 2002, Molecular biology and evolution.

[56]  Peter E Midford,et al.  Estimating a binary character's effect on speciation and extinction. , 2007, Systematic biology.

[57]  Graham Jones,et al.  Algorithmic improvements to species delimitation and phylogeny estimation under the multispecies coalescent , 2017, Journal of mathematical biology.

[58]  J. Skilling Nested sampling for general Bayesian computation , 2006 .

[59]  Mike Steel,et al.  Bayesian Phylogenetic Estimation of Clade Ages Supports Trans‐Atlantic Dispersal of Cichlid Fishes , 2016, Systematic biology.

[60]  Erik M. Volz,et al.  Complex Population Dynamics and the Coalescent Under Neutrality , 2012, Genetics.

[61]  S. Bonhoeffer,et al.  Birth–death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV) , 2012, Proceedings of the National Academy of Sciences.

[62]  R. Bouckaert,et al.  bModelTest: Bayesian phylogenetic site model averaging and model comparison , 2015, BMC Evolutionary Biology.

[63]  Alexei J. Drummond,et al.  A Stochastic Simulator of Birth–Death Master Equations with Application to Phylodynamics , 2013, Molecular biology and evolution.

[64]  Ming-Hui Chen,et al.  Posterior predictive Bayesian phylogenetic model selection. , 2014, Systematic biology.

[65]  W. O. Kermack,et al.  A contribution to the mathematical theory of epidemics , 1927 .

[66]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[67]  Graham R Jones,et al.  Divergence estimation in the presence of incomplete lineage sorting and migration , 2017, bioRxiv.

[68]  Raazesh Sainudiin,et al.  Microsatellite Mutation Models , 2004, Genetics.

[69]  M. Suchard,et al.  Unifying Viral Genetics and Human Transportation Data to Predict the Global Transmission Dynamics of Human Influenza H3N2 , 2014, PLoS pathogens.

[70]  Reinhold Hanel,et al.  Evolution of the immune system influences speciation rates in teleost fishes , 2016, Nature Genetics.

[71]  Mike A. Steel,et al.  Which Phylogenetic Networks are Merely Trees with Additional Arcs? , 2015, Systematic biology.

[72]  Tanja Stadler,et al.  Bayesian Inference of Species Networks from Multilocus Sequence Data , 2017, bioRxiv.

[73]  Nicola De Maio,et al.  SCOTTI: Efficient Reconstruction of Transmission within Outbreaks with the Structured Coalescent , 2016, PLoS Comput. Biol..

[74]  Michael Matschiner,et al.  Disentangling Incomplete Lineage Sorting and Introgression to Refine Species‐Tree Estimates for Lake Tanganyika Cichlid Fishes , 2016, Systematic biology.

[75]  Ad Konings,et al.  Tanganyika Cichlids in their natural habitat , 1998 .

[76]  Remco Bouckaert,et al.  Capturing heterotachy through multi-gamma site models , 2015, bioRxiv.

[77]  P H Harvey,et al.  Tempo and mode of evolution revealed from molecular phylogenies. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[78]  Graham Jones,et al.  DISSECT: an assignment-free Bayesian discovery method for species delimitation under the multispecies coalescent , 2014, bioRxiv.

[79]  Remco Bouckaert,et al.  Evolutionary Rates and Hbv: Issues of Rate Estimation with Bayesian Molecular Methods , 2013, Antiviral therapy.

[80]  W. Salzburger,et al.  Speciation via introgressive hybridization in East African cichlids? , 2002, Molecular ecology.

[81]  L. O.,et al.  A Likelihood Approach to Estimating Phylogeny from Discrete Morphological Character Data , 2002 .

[82]  Tanja Stadler,et al.  MASCOT: parameter and state inference under the marginal structured coalescent approximation , 2017, bioRxiv.

[83]  Peter Beerli,et al.  Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[84]  Garima Singh,et al.  Fungal-algal association patterns in lichen symbiosis linked to macroclimate. , 2017, The New phytologist.

[85]  Igor Siveroni,et al.  Bayesian phylodynamic inference with complex models , 2018, bioRxiv.

[86]  T. Stadler Sampling-through-time in birth-death trees. , 2010, Journal of theoretical biology.

[87]  Ziheng Yang,et al.  Computational Molecular Evolution , 2006 .

[88]  Christian Schlötterer,et al.  Linking Great Apes Genome Evolution across Time Scales Using Polymorphism-Aware Phylogenetic Models , 2013, Molecular biology and evolution.

[89]  C. J-F,et al.  THE COALESCENT , 1980 .

[90]  Wes Hinsley,et al.  After Ebola in West Africa--Unpredictable Risks, Preventable Epidemics. , 2016, The New England journal of medicine.

[91]  R. Nichols,et al.  Gene trees and species trees are not the same. , 2001, Trends in ecology & evolution.

[92]  Tanja Stadler,et al.  Inferring Epidemiological Dynamics with Bayesian Coalescent Inference: The Merits of Deterministic and Stochastic Models , 2014, Genetics.

[93]  G. Yule,et al.  A Mathematical Theory of Evolution Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[94]  N. Goldman,et al.  Codon-substitution models for heterogeneous selection pressure at amino acid sites. , 2000, Genetics.

[95]  D. Falush,et al.  Inference of Homologous Recombination in Bacteria Using Whole-Genome Sequences , 2010, Genetics.

[96]  Remco R. Bouckaert,et al.  DensiTree 2: Seeing Trees Through the Forest , 2014, bioRxiv.

[97]  J. G. Burleigh,et al.  Synthesis of phylogeny and taxonomy into a comprehensive tree of life , 2014, Proceedings of the National Academy of Sciences.

[98]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[99]  Stuart Bradley,et al.  Synthetic Language Generation and Model Validation in BEAST2 , 2016, ArXiv.

[100]  Chieh-Hsi Wu,et al.  Joint Inference of Microsatellite Mutation Models, Population History and Genealogies Using Transdimensional Markov Chain Monte Carlo , 2011, Genetics.

[101]  Tanja Stadler,et al.  Directly Estimating Epidemic Curves From Genomic Data , 2017 .

[102]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[103]  Huw A. Ogilvie,et al.  StarBEAST2 Brings Faster Species Tree Inference and Accurate Estimates of Substitution Rates , 2016, bioRxiv.

[104]  D. Kendall Stochastic Processes and Population Growth , 1949 .

[105]  Luay Nakhleh,et al.  The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection , 2012, PLoS genetics.

[106]  Ming-Hui Chen,et al.  Improving marginal likelihood estimation for Bayesian phylogenetic model selection. , 2011, Systematic biology.

[107]  J. Huelsenbeck,et al.  The fossilized birth–death process for coherent calibration of divergence-time estimates , 2013, Proceedings of the National Academy of Sciences.

[108]  Huw A. Ogilvie,et al.  Computational Performance and Statistical Accuracy of *BEAST and Comparisons with Other Methods , 2015, Systematic biology.

[109]  Matthew W. Hahn,et al.  Gene tree discordance causes apparent substitution rate variation , 2015, bioRxiv.

[110]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[111]  Tanja Stadler,et al.  Bayesian Total-Evidence Dating Reveals the Recent Crown Radiation of Penguins , 2015, Systematic biology.

[112]  Tanja Stadler,et al.  Inferring time-dependent migration and coalescence patterns from genetic sequence and predictor data in structured populations , 2018, bioRxiv.

[113]  Miklós Bálint,et al.  Integrative taxonomy by molecular species delimitation: multi-locus data corroborate a new species of Balkan Drusinae micro-endemics , 2017, BMC Evolutionary Biology.

[114]  R. Nielsen,et al.  Distinguishing migration from isolation: a Markov chain Monte Carlo approach. , 2001, Genetics.

[115]  Alexei J Drummond,et al.  Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. , 2002, Genetics.

[116]  Jeremy M. Brown,et al.  P3: Phylogenetic Posterior Prediction in RevBayes , 2018, Molecular biology and evolution.

[117]  Alexei J. Drummond,et al.  Bayesian Selection of Nucleotide Substitution Models and Their Site Assignments , 2012, Molecular biology and evolution.