Bayesian Inference of Species Networks from Multilocus Sequence Data

Abstract Reticulate species evolution, such as hybridization or introgression, is relatively common in nature. In the presence of reticulation, species relationships can be captured by a rooted phylogenetic network, and orthologous gene evolution can be modeled as bifurcating gene trees embedded in the species network. We present a Bayesian approach to jointly infer species networks and gene trees from multilocus sequence data. A novel birth-hybridization process is used as the prior for the species network, and we assume a multispecies network coalescent prior for the embedded gene trees. We verify the ability of our method to correctly sample from the posterior distribution, and thus to infer a species network, through simulations. To quantify the power of our method, we reanalyze two large data sets of genes from spruces and yeasts. For the three closely related spruces, we verify the previously suggested homoploid hybridization event in this clade; for the yeast data, we find extensive hybridization events. Our method is available within the BEAST 2 add-on SpeciesNetwork, and thus provides an extensible framework for Bayesian inference of reticulate evolution.

[1]  J. Mallet Hybridization as an invasion of the genome. , 2005, Trends in ecology & evolution.

[2]  Luay Nakhleh,et al.  Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. , 2011, Systematic biology.

[3]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[4]  Céline Scornavacca,et al.  Reconstructible Phylogenetic Networks: Do Not Distinguish the Indistinguishable , 2015, PLoS Comput. Biol..

[5]  Graham Jones,et al.  Algorithmic improvements to species delimitation and phylogeny estimation under the multispecies coalescent , 2017, Journal of mathematical biology.

[6]  Gergely J. Szöllősi,et al.  Lateral Gene Transfer from the Dead , 2012, Systematic biology.

[7]  Yufeng Wu,et al.  Close lower and upper bounds for the minimum reticulate network of multiple phylogenetic trees , 2010, Bioinform..

[8]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[9]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[10]  J. Hey Isolation with migration models for more than two populations. , 2010, Molecular biology and evolution.

[11]  Jianquan Liu,et al.  Evolutionary history of Purple cone spruce (Picea purpurea) in the Qinghai–Tibet Plateau: homoploid hybrid origin and Pleistocene expansion , 2014, Molecular ecology.

[12]  L. Pauling,et al.  Evolutionary Divergence and Convergence in Proteins , 1965 .

[13]  D. Pearl,et al.  High-resolution species trees without concatenation , 2007, Proceedings of the National Academy of Sciences.

[14]  Sophie S Abby,et al.  Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations , 2012, Proceedings of the National Academy of Sciences.

[15]  Remco R. Bouckaert,et al.  Bayesian Evolutionary Analysis with BEAST , 2015 .

[16]  D. Balding,et al.  Genealogical inference from microsatellite data. , 1998, Genetics.

[17]  B. Rannala,et al.  Efficient Bayesian Species Tree Inference under the Multispecies Coalescent , 2015, Systematic biology.

[18]  R. Nielsen,et al.  Distinguishing migration from isolation: a Markov chain Monte Carlo approach. , 2001, Genetics.

[19]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[20]  Scott V Edwards,et al.  Coalescent methods for estimating phylogenetic trees. , 2009, Molecular phylogenetics and evolution.

[21]  Luay Nakhleh,et al.  Supplementary Information : Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data , 2017 .

[22]  Timothy G. Vaughan,et al.  IcyTree: rapid browser-based visualization for phylogenetic trees and networks , 2017, bioRxiv.

[23]  S. Bonhoeffer,et al.  Birth–death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV) , 2012, Proceedings of the National Academy of Sciences.

[24]  T. Stadler Sampling-through-time in birth-death trees. , 2010, Journal of theoretical biology.

[25]  Daniel H. Huson,et al.  Fast computation of minimum hybridization networks , 2012, Bioinform..

[26]  É. Tannier,et al.  The Inference of Gene Trees with Species Trees , 2013, Systematic biology.

[27]  E. Thompson,et al.  A two-stage pruning algorithm for likelihood computation for a population tree. , 2008, Genetics.

[28]  Gabriel Cardona,et al.  Extended Newick: it is time for a standard representation of phylogenetic networks , 2008, BMC Bioinformatics.

[29]  R. Bouckaert,et al.  Looking for trees in the forest: summary tree from posterior samples , 2013, BMC Evolutionary Biology.

[30]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[31]  Ziheng Yang,et al.  Inferring speciation times under an episodic molecular clock. , 2007, Systematic biology.

[32]  Luay Nakhleh,et al.  The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection , 2012, PLoS genetics.

[33]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[34]  S. Jeffery Evolution of Protein Molecules , 1979 .

[35]  Yun Yu,et al.  Bayesian inference of phylogenetic networks from bi-allelic genetic markers , 2017, bioRxiv.

[36]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[37]  S. Janson,et al.  The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance , 2006, math/0702415.

[38]  J. Mallet Hybrid speciation , 2007, Nature.

[39]  Liang Liu,et al.  BEST: Bayesian estimation of species trees under the coalescent model , 2008, Bioinform..

[40]  Tianqi Zhu,et al.  Maximum likelihood implementation of an isolation-with-migration model with three species for testing speciation with gene flow. , 2012, Molecular biology and evolution.

[41]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[42]  Yun Yu,et al.  A maximum pseudo-likelihood approach for phylogenetic networks , 2015, BMC Genomics.

[43]  H. Wilkinson-Herbots The distribution of the coalescence time and the number of pairwise nucleotide differences in the "isolation with migration" model. , 2008, Theoretical population biology.

[44]  J. Huelsenbeck,et al.  The fossilized birth–death process for coherent calibration of divergence-time estimates , 2013, Proceedings of the National Academy of Sciences.

[45]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[46]  Kevin J. Liu,et al.  Maximum likelihood inference of reticulate evolutionary histories , 2014, Proceedings of the National Academy of Sciences.

[47]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[48]  Jody Hey,et al.  Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics , 2007, Proceedings of the National Academy of Sciences.

[49]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[50]  Oliver G. Pybus,et al.  Testing macro–evolutionary models using incomplete molecular phylogenies , 2000, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[51]  J. Long The genetic structure of admixed populations. , 1991, Genetics.

[52]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[53]  L. Nakhleh,et al.  ALGORITHMIC STRATEGIES FOR ESTIMATING THE AMOUNT OF RETICULATION FROM A COLLECTION OF GENE TREES , 2010 .

[54]  Claudia R. Solís-Lemus,et al.  Inferring Phylogenetic Networks with Maximum Pseudolikelihood under Incomplete Lineage Sorting , 2015, PLoS genetics.

[55]  David Bryant,et al.  Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. , 2009, Molecular biology and evolution.

[56]  Paul Bastide,et al.  PhyloNetworks: A Package for Phylogenetic Networks , 2017, Molecular biology and evolution.

[57]  Luay Nakhleh,et al.  Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data , 2017, bioRxiv.

[58]  J. Degnan,et al.  Displayed Trees Do Not Determine Distinguishability Under the Network Multispecies Coalescent. , 2016, Systematic biology.

[59]  Effrey,et al.  Divergence Time and Evolutionary Rate Estimation with Multilocus Data , 2002 .

[60]  R. Nielsen,et al.  Multilocus Methods for Estimating Population Sizes, Migration Rates and Divergence Time, With Applications to the Divergence of Drosophila pseudoobscura and D. persimilis , 2004, Genetics.

[61]  Joel Sjöstrand,et al.  A Bayesian method for analyzing lateral gene transfer. , 2014, Systematic biology.

[62]  Michael T. Hallett,et al.  Simultaneous Identification of Duplications and Lateral Gene Transfers , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[63]  Luay Nakhleh,et al.  PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships , 2008, BMC Bioinformatics.

[64]  D. Bryant,et al.  A general comparison of relaxed molecular clock models. , 2007, Molecular biology and evolution.

[65]  Tianqi Zhu,et al.  Maximum Likelihood Implementation of an Isolation‐with‐Migration Model for Three Species , 2016, Systematic biology.