Practical Speedup of Bayesian Inference of Species Phylogenies by Restricting the Space of Gene Trees

Species tree inference from multi-locus data has emerged as a powerful paradigm in the post-genomic era, both in terms of the accuracy of the species tree it produces as well as in terms of elucidating the processes that shaped the evolutionary history. Bayesian methods for species tree inference are desirable in this area as they have been shown to yield accurate estimates, but also to naturally provide measures of confidence in those estimates. However, the heavy computational requirements of Bayesian inference have limited the applicability of such methods to very small data sets. In this paper, we show that the computational efficiency of Bayesian inference under the multispecies coalescent can be improved in practice by restricting the space of the gene trees explored during the random walk, without sacrificing accuracy as measured by various metrics. The idea is to first infer constraints on the trees of the individual loci in the form of unresolved gene trees, and then to restrict the sampler to consider only resolutions of the constrained trees. We demonstrate the improvements gained by such an approach on both simulated and biological data.

[1]  J. Klein,et al.  Divergence time and population size in the lineage leading to modern humans. , 1995, Theoretical population biology.

[2]  Aaron E. Darling,et al.  Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics , 2019, bioRxiv.

[3]  Carlos E. Rodríguez,et al.  Searching for efficient Markov chain Monte Carlo proposal kernels , 2013, Proceedings of the National Academy of Sciences.

[4]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[5]  Luay Nakhleh,et al.  Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data , 2017, bioRxiv.

[6]  Kevin J. Liu,et al.  Maximum likelihood inference of reticulate evolutionary histories , 2014, Proceedings of the National Academy of Sciences.

[7]  Michael S. Y. Lee,et al.  Calibration choice, rate smoothing, and the pattern of tetrapod diversification according to the long nuclear gene RAG-1. , 2007, Systematic biology.

[8]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[9]  Huw A. Ogilvie,et al.  Computational Performance and Statistical Accuracy of *BEAST and Comparisons with Other Methods , 2015, Systematic biology.

[10]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[11]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[12]  Laura Salter Kubatko,et al.  Quartet Inference from SNP Data Under the Coalescent Model , 2014, Bioinform..

[13]  Luay Nakhleh,et al.  Towards an accurate and efficient heuristic for species/gene tree co‐estimation , 2018, Bioinform..

[14]  George Casella,et al.  A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data , 2008, 0808.2902.

[15]  B. Rannala,et al.  Efficient Bayesian Species Tree Inference under the Multispecies Coalescent , 2015, Systematic biology.

[16]  Robert Haselkorn,et al.  Evolution of switchgrass (Panicum virgatum L.) based on sequences of the nuclear gene encoding plastid acetyl-CoA carboxylase☆ , 2003 .

[17]  Sebastián Duchêne,et al.  BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis , 2018, bioRxiv.

[18]  Scott V Edwards,et al.  A maximum pseudo-likelihood approach for estimating species trees under the coalescent model , 2010, BMC Evolutionary Biology.

[19]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[20]  Huw A. Ogilvie,et al.  StarBEAST2 Brings Faster Species Tree Inference and Accurate Estimates of Substitution Rates , 2016, bioRxiv.

[21]  Arndt von Haeseler,et al.  Accuracy of phylogeny reconstruction methods combining overlapping gene data sets , 2010, Algorithms for Molecular Biology.

[22]  Jeet Sukumaran,et al.  DendroPy: a Python library for phylogenetic computing , 2010, Bioinform..

[23]  Alexandros Stamatakis,et al.  ExaBayes: Massively Parallel Bayesian Tree Inference for the Whole-Genome Era , 2014, Molecular biology and evolution.

[24]  Luay Nakhleh,et al.  Bayesian Inference of Reticulate Phylogenies under the Multispecies Network Coalescent , 2016, PLoS genetics.

[25]  Cheng Zhang,et al.  Variational Bayesian Phylogenetic Inference , 2018, ICLR.

[26]  Craig Moritz,et al.  Phylogenomics of a rapid radiation: the Australian rainbow skinks , 2018, BMC Evolutionary Biology.

[27]  Yun Yu,et al.  Bayesian inference of phylogenetic networks from bi-allelic genetic markers , 2017, bioRxiv.

[28]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[29]  Chi Zhang,et al.  Using Parsimony-Guided Tree Proposals to Accelerate Convergence in Bayesian Phylogenetic Inference , 2019, bioRxiv.

[30]  Jinchuan Xing,et al.  Mobile elements reveal small population size in the ancient ancestors of Homo sapiens , 2010, Proceedings of the National Academy of Sciences.

[31]  Xiyun Jiao,et al.  Species Tree Inference with BPP Using Genomic Sequences and the Multispecies Coalescent , 2018, Molecular biology and evolution.

[32]  Alexei J Drummond,et al.  Guided tree topology proposals for Bayesian phylogenetic inference. , 2012, Systematic biology.

[33]  Tanja Stadler,et al.  Bayesian Inference of Species Networks from Multilocus Sequence Data , 2017, bioRxiv.

[34]  Ziheng Yang,et al.  Likelihood and Bayes estimation of ancestral population sizes in hominoids using data from multiple loci. , 2002, Genetics.

[35]  Ziheng Yang,et al.  Unguided Species Delimitation Using DNA Sequence Data from Multiple Loci , 2014, Molecular biology and evolution.

[36]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[37]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[38]  Graham Jones,et al.  Algorithmic improvements to species delimitation and phylogeny estimation under the multispecies coalescent , 2017, Journal of mathematical biology.

[39]  Marta Riutort,et al.  First report of the land planarian Diversibipalium multilineatum (Makino & Shirasawa, 1983) (Platyhelminthes, Tricladida, Continenticola) in Europe. , 2016, Zootaxa.

[40]  C Graham Clark,et al.  Last of the human protists: the phylogeny and genetic diversity of Iodamoeba. , 2012, Molecular biology and evolution.

[41]  Kevin E. Langergraber,et al.  Generation times in wild chimpanzees and gorillas suggest earlier divergence times in great ape and human evolution , 2012, Proceedings of the National Academy of Sciences.

[42]  Virmantas Stunžėnas,et al.  Phylogeny of Sphaerium solidum (Bivalvia) based on karyotype and sequences of 16S and ITS1 rDNA , 2011, Central European Journal of Biology.

[43]  Joseph Felsenstein,et al.  The number of evolutionary trees , 1978 .

[44]  Tandy J. Warnow,et al.  ASTRAL: genome-scale coalescent-based species tree estimation , 2014, Bioinform..

[45]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[46]  Mary K Kuhner,et al.  Coalescent genealogy samplers: windows into population history. , 2009, Trends in ecology & evolution.

[47]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[48]  Liangliang Wang,et al.  An Annealed Sequential Monte Carlo Method for Bayesian Phylogenetics. , 2018, Systematic biology.

[49]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[50]  Liang Liu,et al.  Estimating species trees from unrooted gene trees. , 2011, Systematic biology.

[51]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[52]  Torben Asp,et al.  Tracking the evolution of a cold stress associated gene family in cold tolerant grasses , 2008, BMC Evolutionary Biology.

[53]  J. Huelsenbeck,et al.  Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. , 2008, Systematic biology.

[54]  S. Jeffery Evolution of Protein Molecules , 1979 .

[55]  Luay Nakhleh,et al.  Inferring Phylogenetic Networks Using PhyloNet , 2017, bioRxiv.

[56]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[57]  Katherine St. John,et al.  Review Paper: The Shape of Phylogenetic Treespace , 2016, Systematic biology.