DISSECT: an assignment-free Bayesian discovery method for species delimitation under the multispecies coalescent

MOTIVATION The multispecies coalescent model provides a formal framework for the assignment of individual organisms to species, where the species are modeled as the branches of the sp tree. None of the available approaches so far have simultaneously co-estimated all the relevant parameters in the model, without restricting the parameter space by requiring a guide tree and/or prior assignment of individuals to clusters or species. RESULTS We present DISSECT, which explores the full space of possible clusterings of individuals and species tree topologies in a Bayesian framework. It uses an approximation to avoid the need for reversible-jump Markov Chain Monte Carlo, in the form of a prior that is a modification of the birth-death prior for the species tree. It incorporates a spike near zero in the density for node heights. The model has two extra parameters: one controls the degree of approximation and the second controls the prior distribution on the numbers of species. It is implemented as part of BEAST and requires only a few changes from a standard *BEAST analysis. The method is evaluated on simulated data and demonstrated on an empirical dataset. The method is shown to be insensitive to the degree of approximation, but quite sensitive to the second parameter, suggesting that large numbers of sequences are needed to draw firm conclusions. AVAILABILITY AND IMPLEMENTATION http://tree.bio.ed.ac.uk/software/beast/, http://www.indriid.com/dissectinbeast.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[2]  Bryan C. Carstens,et al.  How to fail at species delimitation , 2013, Molecular ecology.

[3]  Brian C. O'Meara,et al.  New Heuristic Methods for Joint Species Delimitation and Species Tree Inference , 2009, Systematic biology.

[4]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[5]  B. Carstens,et al.  Multilocus species delimitation in a complex of morphologically conserved trapdoor spiders (mygalomorphae, antrodiaetidae, aliatypus). , 2013, Systematic biology.

[6]  Elizabeth A. Thompson,et al.  Human Evolutionary Trees , 1975 .

[7]  C. Moritz,et al.  Multilocus phylogenetics of a rapid radiation in the genus Thomomys (Rodentia: Geomyidae). , 2008, Systematic biology.

[8]  Bryan C. Carstens,et al.  Delimiting species without monophyletic gene trees. , 2007, Systematic biology.

[9]  J. Huelsenbeck,et al.  Inference of Population Structure Under a Dirichlet Process Model , 2007, Genetics.

[10]  B. Oxelman,et al.  Marginal Likelihood Estimate Comparisons to Obtain Optimal Species Delimitations in Silene sect. Cryptoneurae (Caryophyllaceae) , 2014, PloS one.

[11]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[12]  Ziheng Yang,et al.  Unguided Species Delimitation Using DNA Sequence Data from Multiple Loci , 2014, Molecular biology and evolution.

[13]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[14]  M. Suchard,et al.  Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. , 2012, Molecular biology and evolution.

[15]  Aurélien Miralles,et al.  New Metrics for Comparison of Taxonomies Reveal Striking Discrepancies among Species Delimitation Methods in Madascincus Lizards , 2013, PloS one.

[16]  R. Bouckaert,et al.  Species Delimitation using Genome-Wide SNP Data , 2013, bioRxiv.

[17]  R. Shankar,et al.  Principles of Quantum Mechanics , 2010 .

[18]  Tanja Gernhard,et al.  The conditioned reconstructed process. , 2008, Journal of theoretical biology.

[19]  T. Reeder,et al.  Species delimitation using Bayes factors: simulations and application to the Sceloporus scalaris species group (Squamata: Phrynosomatidae). , 2014, Systematic biology.

[20]  Ziheng Yang,et al.  Improved Reversible Jump Algorithms for Bayesian Species Delimitation , 2013, Genetics.

[21]  L. Knowles,et al.  Upstream analyses create problems with DNA-based species delimitation. , 2014, Systematic biology.

[22]  B. Charlesworth,et al.  Genetic Revolutions, Founder Effects, and Speciation , 1984 .

[23]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[24]  Ming-Hui Chen,et al.  Improving marginal likelihood estimation for Bayesian phylogenetic model selection. , 2011, Systematic biology.

[25]  Klaus Peter Schliep,et al.  phangorn: phylogenetic analysis in R , 2010, Bioinform..

[26]  B. Rannala,et al.  Bayesian species delimitation using multilocus sequence data , 2010, Proceedings of the National Academy of Sciences.

[27]  K. de Queiroz,et al.  Species concepts and species delimitation. , 2007, Systematic biology.

[28]  Daryl E. Wilson,et al.  Mammal Species of the World: A Taxonomic and Geographic Reference , 1993 .

[29]  Craig Moritz,et al.  Coalescent-based species delimitation in an integrative taxonomy. , 2012, Trends in ecology & evolution.

[30]  Tianqi Zhu,et al.  Evaluation of a bayesian coalescent method of species delimitation. , 2011, Systematic biology.

[31]  Kevin de Queiroz,et al.  Species Concepts and Species Delimitation , 2007 .

[32]  Wai Lok Sibon Li,et al.  Accurate model selection of relaxed molecular clocks in bayesian phylogenetics. , 2012, Molecular biology and evolution.

[33]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[34]  Jeremy M. Brown,et al.  Poor fit to the multispecies coalescent is widely detectable in empirical data. , 2014, Systematic biology.

[35]  Vladimir N. Minin,et al.  Species Delimitation using Genome-Wide SNP Data , 2013, bioRxiv.

[36]  Bryan C. Carstens,et al.  SpedeSTEM: a rapid and accurate method for species delimitation , 2011, Molecular ecology resources.

[37]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[38]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[39]  G. R. Seamons,et al.  Mammal Species of the World: A Taxonomic and Geographic Reference (3rd edition) , 2006 .

[40]  Ziheng Yang,et al.  The influence of gene flow on species tree estimation: a simulation study. , 2014, Systematic biology.

[41]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[42]  T. Yee The VGAM Package for Categorical Data Analysis , 2010 .

[43]  G. Yule,et al.  A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[44]  David Bryant,et al.  Simulating gene trees under the multispecies coalescent and time-dependent migration , 2013, BMC Evolutionary Biology.