The influence of gene flow on species tree estimation: a simulation study.

Gene flow among populations or species and incomplete lineage sorting (ILS) are two evolutionary processes responsible for generating gene tree discordance and therefore hindering species tree estimation. Numerous studies have evaluated the impacts of ILS on species tree inference, yet the ramifications of gene flow on species trees remain less studied. Here, we simulate and analyse multilocus sequence data generated with ILS and gene flow to quantify their impacts on species tree inference. We characterize species tree estimation errors under various models of gene flow, such as the isolation-migration model, the n-island model, and gene flow between non-sister species or involving ancestral species, and species boundaries crossed by a single gene copy (allelic introgression) or by a single migrant individual. These patterns of gene flow are explored on species trees of different sizes (4 vs. 10 species), at different time scales (shallow vs. deep), and with different migration rates. Species trees are estimated with the multispecies coalescent model using Bayesian methods (BEST and *BEAST) and with a summary statistic approach (MPEST) that facilitates phylogenomic-scale analysis. Even in cases where the topology of the species tree is estimated with high accuracy, we find that gene flow can result in overestimates of population sizes (species tree dilation) and underestimates of species divergence times (species tree compression). Signatures of migration events remain present in the distribution of coalescent times for gene trees, and with sufficient data it is possible to identify those loci that have crossed species boundaries. These results highlight the need for careful sampling design in phylogeographic and species delimitation studies as gene flow, introgression, or incorrect sample assignments can bias the estimation of the species tree topology and of parameter estimates such as population sizes and divergence times.

[1]  C. Moritz,et al.  Recent and rapid speciation with limited morphological disparity in the genus Rattus. , 2011, Systematic biology.

[2]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[3]  David Gerard,et al.  Estimating hybridization in the presence of coalescence using phylogenetic intraspecific sampling , 2011, BMC Evolutionary Biology.

[4]  Liang Liu,et al.  Estimating Species Trees Using Multiple-Allele DNA Sequence Data , 2008, Evolution; international journal of organic evolution.

[5]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[6]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[7]  T. Sang,et al.  Testing hybridization hypotheses based on incongruent gene trees. , 2000, Systematic biology.

[8]  Mariana Morando,et al.  Accuracy and precision of species trees: effects of locus, individual, and base pair sampling on inference of species trees in lizards of the Liolaemus darwinii group (Squamata, Liolaemidae). , 2012, Systematic biology.

[9]  Montgomery Slatkin,et al.  Subdivision in an ancestral species creates asymmetry in gene trees. , 2008, Molecular biology and evolution.

[10]  S. Edwards,et al.  Genetic Introgression: An Integral but Neglected Component of Speciation in Birds , 2011 .

[11]  Bryan C. Carstens,et al.  Species Delimitation Using a Combined Coalescent and Information-Theoretic Approach: An Example from North American Myotis Bats , 2010, Systematic biology.

[12]  J. Hey Isolation with migration models for more than two populations. , 2010, Molecular biology and evolution.

[13]  Scott V Edwards,et al.  A maximum pseudo-likelihood approach for estimating species trees under the coalescent model , 2010, BMC Evolutionary Biology.

[14]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[15]  B. Rannala,et al.  Phylogenetic inference using whole genomes. , 2008, Annual review of genomics and human genetics.

[16]  J. Hey,et al.  Joint Inference of Population Assignment and Demographic History , 2011, Genetics.

[17]  S. Edwards IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING? , 2009, Evolution; international journal of organic evolution.

[18]  David Bryant,et al.  Simulating gene trees under the multispecies coalescent and time-dependent migration , 2013, BMC Evolutionary Biology.

[19]  P. Wirtz Mother species–father species: unidirectional hybridization in animals with female choice , 1999, Animal Behaviour.

[20]  Mary K Kuhner,et al.  Coalescent genealogy samplers: windows into population history. , 2009, Trends in ecology & evolution.

[21]  Ziheng Yang A Likelihood Ratio Test of Speciation with Gene Flow Using Genomic Sequence Data , 2010, Genome biology and evolution.

[22]  Luay Nakhleh,et al.  The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection , 2012, PLoS genetics.

[23]  Nicholas Stiffler,et al.  Population Genomics of Parallel Adaptation in Threespine Stickleback using Sequenced RAD Tags , 2010, PLoS genetics.

[24]  Laura Salter Kubatko,et al.  Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: a model. , 2009, Theoretical population biology.

[25]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[26]  B. Rannala,et al.  Bayesian species delimitation using multilocus sequence data , 2010, Proceedings of the National Academy of Sciences.

[27]  S. Jeffery Evolution of Protein Molecules , 1979 .

[28]  T. Glenn Field guide to next‐generation DNA sequencers , 2011, Molecular ecology resources.

[29]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[30]  Tianqi Zhu,et al.  Tail paradox, partial identifiability, and influential priors in Bayesian branch length inference. , 2012, Molecular biology and evolution.

[31]  Roland Kays,et al.  A genome-wide perspective on the evolutionary history of enigmatic wolf-like canids. , 2011, Genome research.

[32]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[33]  M Slatkin,et al.  A cladistic measure of gene flow inferred from the phylogenies of alleles. , 1989, Genetics.

[34]  L. Kubatko Identifying hybridization events in the presence of coalescence via model selection. , 2009, Systematic biology.

[35]  Scott V Edwards,et al.  Coalescent methods for estimating phylogenetic trees. , 2009, Molecular phylogenetics and evolution.

[36]  Jody Hey,et al.  Divergence with Gene Flow: Models and Data , 2010 .

[37]  Montgomery Slatkin,et al.  Gene Flow in Natural Populations , 1985 .

[38]  S. Edwards,et al.  A species tree for the Australo-Papuan Fairy-wrens and allies (Aves: Maluridae). , 2012, Systematic biology.

[39]  S. Joly JML: testing hybridization from species trees , 2012, Molecular ecology resources.

[40]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[41]  W. Maddison,et al.  Inferring phylogeny despite incomplete lineage sorting. , 2006, Systematic biology.

[42]  C. Ané,et al.  Comparing two Bayesian methods for gene tree/species tree reconstruction: simulations with incomplete lineage sorting and horizontal gene transfer. , 2011, Systematic biology.

[43]  Patricia A. McLenachan,et al.  A Statistical Approach for Distinguishing Hybridization and Incomplete Lineage Sorting , 2009, The American Naturalist.

[44]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[45]  M. Carling,et al.  Integrating Phylogenetic and Population Genetic Analyses of Multiple Loci to Test Species Divergence Hypotheses in Passerina Buntings , 2008, Genetics.

[46]  Bryan C Carstens,et al.  Does gene flow destroy phylogenetic signal? The performance of three methods for estimating species phylogenies in the presence of gene flow. , 2008, Molecular phylogenetics and evolution.

[47]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[48]  Bryan C Carstens,et al.  Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. , 2007, Systematic biology.

[49]  Liang Liu,et al.  BEST: Bayesian estimation of species trees under the coalescent model , 2008, Bioinform..

[50]  Tianqi Zhu,et al.  Evaluation of a bayesian coalescent method of species delimitation. , 2011, Systematic biology.

[51]  Craig Moritz,et al.  Patterns of persistence and isolation indicate resilience to climate change in montane rainforest lizards , 2010, Molecular ecology.

[52]  Bruce Rannala,et al.  The accuracy of species tree estimation under simulation: a comparison of methods. , 2011, Systematic biology.

[53]  N. Takahata A GENETIC PERSPECTIVE ON THE ORIGIN AND HISTORY OF HUMANS , 1995 .

[54]  Bryan C. Carstens,et al.  SpedeSTEM: a rapid and accurate method for species delimitation , 2011, Molecular ecology resources.

[55]  R. Nichols,et al.  Gene trees and species trees are not the same. , 2001, Trends in ecology & evolution.

[56]  John E McCormack,et al.  Maximum likelihood estimates of species trees: how accuracy of phylogenetic inference depends upon the divergence history and sampling design. , 2009, Systematic biology.

[57]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[58]  Bryan C. Carstens,et al.  Delimiting species without monophyletic gene trees. , 2007, Systematic biology.

[59]  R. Nielsen,et al.  Multilocus Methods for Estimating Population Sizes, Migration Rates and Divergence Time, With Applications to the Divergence of Drosophila pseudoobscura and D. persimilis , 2004, Genetics.

[60]  M. Holder,et al.  Difficulties in detecting hybridization. , 2001, Systematic biology.

[61]  R. Nielsen,et al.  Distinguishing migration from isolation: a Markov chain Monte Carlo approach. , 2001, Genetics.

[62]  A. Leaché Species tree discordance traces to phylogeographic clade boundaries in North American fence lizards (Sceloporus). , 2009, Systematic biology.

[63]  James C. Wilgenbusch,et al.  AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics , 2008, Bioinform..

[64]  Richard R. Hudson,et al.  TESTING THE CONSTANT‐RATE NEUTRAL ALLELE MODEL WITH PROTEIN SEQUENCE DATA , 1983, Evolution; international journal of organic evolution.

[65]  Laura Kubatko,et al.  Estimating species trees : practical and theoretical aspects , 2010 .

[66]  L. Excoffier,et al.  Computer programs for population genetics data analysis: a survival guide , 2006, Nature Reviews Genetics.

[67]  B. Rannala,et al.  Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. , 2004, Systematic biology.