Algorithmic improvements to species delimitation and phylogeny estimation under the multispecies coalescent

The focus of this article is a Bayesian method for inferring both species delimitations and species trees under the multispecies coalescent model using molecular sequences from multiple loci. The species delimitation requires no a priori assignment of individuals to species, and no guide tree. The method is implemented in a package called STACEY for BEAST2, and is a extension of the author’s DISSECT package. Here we demonstrate considerable efficiency improvements by using three new operators for sampling from the posterior using the Markov chain Monte Carlo algorithm, and by using a model for the population size parameters along the branches of the species tree which allows these parameters to be integrated out. The correctness of the moves is demonstrated by tests of the implementation. The practice of using a pipeline approach to species delimitation under the multispecies coalescent, has been shown to have major problems on simulated data (Olave et al. in Syst Biol 63:263–271. doi:10.1093/sysbio/syt106, 2014). The same simulated data set is used to demonstrate the accuracy and improved convergence of the present method. We also compare performance with *BEAST for a fixed delimitation analysis on a large data set, and again show improved convergence.

[1]  B. Rannala,et al.  Bayesian Species Delimitation Can Be Robust to Guide-Tree Inference Errors , 2014, Systematic biology.

[2]  Jody Hey,et al.  Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics , 2007, Proceedings of the National Academy of Sciences.

[3]  Claudia R. Solís-Lemus,et al.  Bayesian species delimitation combining multiple genes and traits in a unified framework , 2015, Evolution; international journal of organic evolution.

[4]  J. Huelsenbeck,et al.  Inference of Population Structure Under a Dirichlet Process Model , 2007, Genetics.

[5]  B. Rannala,et al.  Bayesian species delimitation using multilocus sequence data , 2010, Proceedings of the National Academy of Sciences.

[6]  Liang Liu,et al.  Estimating Species Trees Using Multiple-Allele DNA Sequence Data , 2008, Evolution; international journal of organic evolution.

[7]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[8]  Michael Defoin-Platel,et al.  Clock-constrained tree proposal operators in Bayesian phylogenetic inference , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[9]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[10]  Ziheng Yang,et al.  Improved Reversible Jump Algorithms for Bayesian Species Delimitation , 2013, Genetics.

[11]  Bruce Rannala,et al.  The art and science of species delimitation , 2015 .

[12]  Qixin He,et al.  Sources of error inherent in species-tree estimation: impact of mutational and coalescent effects on accuracy and implications for choosing among different methods. , 2010, Systematic biology.

[13]  L. Knowles,et al.  Upstream analyses create problems with DNA-based species delimitation. , 2014, Systematic biology.

[14]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[15]  Ziheng Yang,et al.  Likelihood and Bayes estimation of ancestral population sizes in hominoids using data from multiple loci. , 2002, Genetics.

[16]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[17]  Jacob A. Esselstyn,et al.  The Challenges of Resolving a Rapid, Recent Radiation: Empirical and Simulated Phylogenomics of Philippine Shrews. , 2015, Systematic biology.

[18]  Graham Jones,et al.  DISSECT: an assignment-free Bayesian discovery method for species delimitation under the multispecies coalescent , 2014, bioRxiv.

[19]  Jean‐François Flot Species Delimitation's Coming of Age. , 2015, Systematic biology.

[20]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[21]  Nadia El-Mabrouk,et al.  Reconstructing an ancestral genome using minimum segments duplications and reversals , 2002, J. Comput. Syst. Sci..

[22]  Ziheng Yang,et al.  Unguided Species Delimitation Using DNA Sequence Data from Multiple Loci , 2014, Molecular biology and evolution.

[23]  S. Edwards IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING? , 2009, Evolution; international journal of organic evolution.

[24]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.