Divergence estimation in the presence of incomplete lineage sorting and migration

&NA; This article focuses on the problem of estimating a species tree from multilocus data in the presence of incomplete lineage sorting and migration. I develop a mathematical model similar to IMa2 (Hey 2010) for the relevant evolutionary processes which allows both the population size parameters and the migration rates between pairs of species tree branches to be integrated out. I then describe a BEAST2 package DENIM (Divergence estimation notwithstanding ILS and migration) which is based on this model and which uses an approximation to sample from the posterior. The approximation is based on the assumption that migrations are rare, and it only samples from certain regions of the posterior which seem likely given this assumption. The method breaks down if there is a lot of migration. Using simulations, Leaché et al. (2014) showed that using the standard multispecies coalescent model to infer a species tree can result in poor accuracy if migration is present. I reanalyze this simulated data to explore DENIM's performance and demonstrate substantial improvements in accuracy over *BEAST. I also reanalyze an empirical data set.

[1]  C. J-F,et al.  THE COALESCENT , 1980 .

[2]  Kevin J. Liu,et al.  Fast and accurate statistical inference of phylogenetic networks using large-scale genomic sequence data , 2017, bioRxiv.

[3]  Graham M. Hughes,et al.  Genome-wide signatures of complex introgression and adaptive evolution in the big cats , 2017, Science Advances.

[4]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[5]  R. Nielsen,et al.  Multilocus Methods for Estimating Population Sizes, Migration Rates and Divergence Time, With Applications to the Divergence of Drosophila pseudoobscura and D. persimilis , 2004, Genetics.

[6]  Brian C. O'Meara,et al.  PHRAPL: Phylogeographic Inference Using Approximate Likelihoods , 2017, Systematic biology.

[7]  Graham Jones,et al.  Algorithmic improvements to species delimitation and phylogeny estimation under the multispecies coalescent , 2017, Journal of mathematical biology.

[8]  Jody Hey,et al.  Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics , 2007, Proceedings of the National Academy of Sciences.

[9]  Ziheng Yang,et al.  The influence of gene flow on species tree estimation: a simulation study. , 2014, Systematic biology.

[10]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[11]  Peter Beerli,et al.  Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Claudia R. Solís-Lemus,et al.  Inferring Phylogenetic Networks with Maximum Pseudolikelihood under Incomplete Lineage Sorting , 2015, PLoS genetics.

[13]  Luay Nakhleh,et al.  Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data , 2017, bioRxiv.

[14]  L. Excoffier,et al.  Genomics of Rapid Incipient Speciation in Sympatric Threespine Stickleback , 2016, PLoS genetics.

[15]  Patrik Nosil,et al.  Speciation with gene flow could be common , 2008, Molecular ecology.

[16]  J. Hey Isolation with migration models for more than two populations. , 2010, Molecular biology and evolution.

[17]  Tanja Stadler,et al.  Bayesian Inference of Species Networks from Multilocus Sequence Data , 2017 .

[18]  Jianquan Liu,et al.  Evolutionary history of Purple cone spruce (Picea purpurea) in the Qinghai–Tibet Plateau: homoploid hybrid origin and Pleistocene expansion , 2014, Molecular ecology.

[19]  M. Fujita,et al.  Introgression and phenotypic assimilation in Zimmerius flycatchers (Tyrannidae): population genetic and phylogenetic inferences from genome-wide SNPs. , 2014, Systematic biology.

[20]  Graham Jones,et al.  DISSECT: an assignment-free Bayesian discovery method for species delimitation under the multispecies coalescent , 2014, bioRxiv.

[21]  Jeremy M. Brown,et al.  Poor fit to the multispecies coalescent is widely detectable in empirical data. , 2014, Systematic biology.

[22]  Bryan C. Carstens,et al.  Posterior predictive checks of coalescent models: P2C2M, an R package , 2016, Molecular ecology resources.

[23]  C. Moritz,et al.  Multilocus phylogenetics of a rapid radiation in the genus Thomomys (Rodentia: Geomyidae). , 2008, Systematic biology.

[24]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[25]  R. Soreng,et al.  Miocene-Pliocene speciation, introgression, and migration of Patis and Ptilagrostis (Poaceae: Stipeae). , 2014, Molecular phylogenetics and evolution.

[26]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[27]  Simon H. Martin,et al.  Genome-wide evidence for speciation with gene flow in Heliconius butterflies , 2013, Genome research.

[28]  Vladimir Makarenkov,et al.  A new effective method for estimating missing values in the sequence data prior to phylogenetic analysis , 2006, Evolutionary bioinformatics online.

[29]  Huw A. Ogilvie,et al.  StarBEAST2 brings faster species tree inference and accurate estimates of substitution rates , 2016 .

[30]  L. Kubatko,et al.  Distribution of coalescent histories under the coalescent model with gene flow. , 2016, Molecular phylogenetics and evolution.

[31]  P. Beerli,et al.  A Continuous Method for Gene Flow , 2013, Genetics.

[32]  Tianqi Zhu,et al.  Maximum Likelihood Implementation of an Isolation‐with‐Migration Model for Three Species , 2016, Systematic biology.

[33]  Ziheng Yang The BPP program for species tree estimation and species delimitation , 2015 .