Bayesian Inference of Species Networks from Multilocus Sequence Data

Reticulate species evolution, such as hybridization or introgression, is relatively common in nature. In the presence of reticulation, species relationships can be captured by a rooted phylogenetic network, and orthologous gene evolution can be modeled as bifurcating gene trees embedded in the species network. We present a Bayesian approach to jointly infer species networks and gene trees from multilocus sequence data. A novel birth-hybridization process is used as the prior for the species network. We assume a multispecies network coalescent (MSNC) prior for the embedded gene trees. We verify the ability of our method to correctly sample from the posterior distribution, and thus to infer a species network, through simulations. We reanalyze a large dataset of genes from closely related spruces, and verify the previously suggested homoploid hybridization event in this clade. Our method is available within the BEAST 2 add-on SpeciesNetwork, and thus provides a general framework for Bayesian inference of reticulate evolution.

[1]  Jody Hey,et al.  Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics , 2007, Proceedings of the National Academy of Sciences.

[2]  J. Mallet Hybridization as an invasion of the genome. , 2005, Trends in ecology & evolution.

[3]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[4]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[5]  Daniel H. Huson,et al.  Fast computation of minimum hybridization networks , 2012, Bioinform..

[6]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[7]  Yun Yu,et al.  A maximum pseudo-likelihood approach for phylogenetic networks , 2015, BMC Genomics.

[8]  Bayesian Inference Of Phylogenetic Networks From Bi-allelic Genetic Markers , 2017 .

[9]  S. Janson,et al.  The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance , 2006, math/0702415.

[10]  J. Mallet Hybrid speciation , 2007, Nature.

[11]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[12]  Kevin J. Liu,et al.  Maximum likelihood inference of reticulate evolutionary histories , 2014, Proceedings of the National Academy of Sciences.

[13]  T. Stadler Sampling-through-time in birth-death trees. , 2010, Journal of theoretical biology.

[14]  Luay Nakhleh,et al.  Supplementary Information : Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data , 2017 .

[15]  Luay Nakhleh,et al.  Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. , 2011, Systematic biology.

[16]  E. Thompson,et al.  A two-stage pruning algorithm for likelihood computation for a population tree. , 2008, Genetics.

[17]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[18]  B. Rannala,et al.  Efficient Bayesian Species Tree Inference under the Multispecies Coalescent , 2015, Systematic biology.

[19]  Yufeng Wu,et al.  Close lower and upper bounds for the minimum reticulate network of multiple phylogenetic trees , 2010, Bioinform..

[20]  Scott V Edwards,et al.  Coalescent methods for estimating phylogenetic trees. , 2009, Molecular phylogenetics and evolution.

[21]  J. Degnan,et al.  Displayed Trees Do Not Determine Distinguishability Under the Network Multispecies Coalescent. , 2016, Systematic biology.

[22]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[23]  Jianquan Liu,et al.  Evolutionary history of Purple cone spruce (Picea purpurea) in the Qinghai–Tibet Plateau: homoploid hybrid origin and Pleistocene expansion , 2014, Molecular ecology.

[24]  Effrey,et al.  Divergence Time and Evolutionary Rate Estimation with Multilocus Data , 2002 .

[25]  L. Pauling,et al.  Evolutionary Divergence and Convergence in Proteins , 1965 .

[26]  Gabriel Cardona,et al.  Extended Newick: it is time for a standard representation of phylogenetic networks , 2008, BMC Bioinformatics.

[27]  Claudia R. Solís-Lemus,et al.  Inferring Phylogenetic Networks with Maximum Pseudolikelihood under Incomplete Lineage Sorting , 2015, PLoS genetics.

[28]  David Bryant,et al.  Next-generation sequencing reveals phylogeographic structure and a species tree for recent bird divergences. , 2009, Molecular phylogenetics and evolution.

[29]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[30]  Paul Bastide,et al.  PhyloNetworks: A Package for Phylogenetic Networks , 2017, Molecular biology and evolution.

[31]  Ziheng Yang,et al.  Inferring speciation times under an episodic molecular clock. , 2007, Systematic biology.

[32]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[33]  D. Bryant,et al.  A general comparison of relaxed molecular clock models. , 2007, Molecular biology and evolution.

[34]  J. Huelsenbeck,et al.  The fossilized birth–death process for coherent calibration of divergence-time estimates , 2013, Proceedings of the National Academy of Sciences.

[35]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[36]  L. Nakhleh,et al.  ALGORITHMIC STRATEGIES FOR ESTIMATING THE AMOUNT OF RETICULATION FROM A COLLECTION OF GENE TREES , 2010 .

[37]  S. Bonhoeffer,et al.  Birth–death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV) , 2012, Proceedings of the National Academy of Sciences.

[38]  Luay Nakhleh,et al.  The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection , 2012, PLoS genetics.

[39]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[40]  S. Jeffery Evolution of Protein Molecules , 1979 .

[41]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[42]  D. Balding,et al.  Genealogical inference from microsatellite data. , 1998, Genetics.

[43]  Graham Jones,et al.  Algorithmic improvements to species delimitation and phylogeny estimation under the multispecies coalescent , 2017, Journal of mathematical biology.

[44]  D. Pearl,et al.  High-resolution species trees without concatenation , 2007, Proceedings of the National Academy of Sciences.

[45]  Remco R. Bouckaert,et al.  Bayesian Evolutionary Analysis with BEAST , 2015 .

[46]  Timothy G. Vaughan,et al.  IcyTree: rapid browser-based visualization for phylogenetic trees and networks , 2017, bioRxiv.

[47]  R. Bouckaert,et al.  Looking for trees in the forest: summary tree from posterior samples , 2013, BMC Evolutionary Biology.

[48]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[49]  Liang Liu,et al.  BEST: Bayesian estimation of species trees under the coalescent model , 2008, Bioinform..

[50]  Luay Nakhleh,et al.  PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships , 2008, BMC Bioinformatics.

[51]  Oliver G. Pybus,et al.  Testing macro–evolutionary models using incomplete molecular phylogenies , 2000, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[52]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[53]  Céline Scornavacca,et al.  Reconstructible Phylogenetic Networks: Do Not Distinguish the Indistinguishable , 2015, PLoS Comput. Biol..