The divergence history of European blue mussel species reconstructed from Approximate Bayesian Computation: the effects of sequencing techniques and sampling strategies

Genome-scale diversity data are increasingly available in a variety of biological systems, and can be used to reconstruct the past evolutionary history of species divergence. However, extracting the full demographic information from these data is not trivial, and requires inferential methods that account for the diversity of coalescent histories throughout the genome. Here, we evaluate the potential and limitations of one such approach. We reexamine a well-known system of mussel sister species, using the joint site frequency spectrum (jSFS) of synonymous mutations computed either from exome capture or RNA-seq, in an Approximate Bayesian Computation (ABC) framework. We first assess the best sampling strategy (number of: individuals, loci, and bins in the jSFS), and show that model selection is robust to variation in the number of individuals and loci. In contrast, different binning choices when summarizing the jSFS, strongly affect the results: including classes of low and high frequency shared polymorphisms can more effectively reveal recent migration events. We then take advantage of the flexibility of ABC to compare more realistic models of speciation, including variation in migration rates through time (i.e., periodic connectivity) and across genes (i.e., genome-wide heterogeneity in migration rates). We show that these models were consistently selected as the most probable, suggesting that mussels have experienced a complex history of gene flow during divergence and that the species boundary is semi-permeable. Our work provides a comprehensive evaluation of ABC demographic inference in mussels based on the coding jSFS, and supplies guidelines for employing different sequencing techniques and sampling strategies. We emphasize, perhaps surprisingly, that inferences are less limited by the volume of data, than by the way in which they are analyzed.

[1]  Jody Hey,et al.  Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics , 2007, Proceedings of the National Academy of Sciences.

[2]  N. Galtier,et al.  Shedding Light on the Grey Zone of Speciation along a Continuum of Genomic Divergence , 2016, bioRxiv.

[3]  C. Zou,et al.  Fast diffusion of domesticated maize to temperate zones , 2017, Scientific Reports.

[4]  F. Tajima DNA polymorphism in a subdivided population: the expected number of segregating sites in the two-subpopulation model. , 1989, Genetics.

[5]  Nicolas Galtier,et al.  The Population Genomics of a Fast Evolver: High Levels of Diversity, Functional Constraint, and Molecular Adaptation in the Tunicate Ciona intestinalis , 2012, Genome biology and evolution.

[6]  Mattias Jakobsson,et al.  Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation , 2012, BMC Genetics.

[7]  Katalin Csill'ery,et al.  abc: an R package for approximate Bayesian computation (ABC) , 2011, 1106.2793.

[8]  Jack Sullivan,et al.  Demographic model selection using random forests and the site frequency spectrum , 2017, Molecular ecology.

[9]  A. Telenti,et al.  The Genomic Signature of Population Reconnection Following Isolation: From Theory to HIV , 2015, G3: Genes, Genomes, Genetics.

[10]  J. Welch,et al.  Local interspecies introgression is the main cause of extreme levels of intraspecific differentiation in mussels , 2016, Molecular ecology.

[11]  G. Hewitt The genetic legacy of the Quaternary ice ages , 2000, Nature.

[12]  G. Coop,et al.  Population-genomic inference of the strength and timing of selection against gene flow , 2017, Proceedings of the National Academy of Sciences.

[13]  Rajiv C. McCoy,et al.  Genomic inference accurately predicts the timing and severity of a recent bottleneck in a nonmodel insect population , 2014, Molecular ecology.

[14]  N. Galtier,et al.  Reference-Free Population Genomics from Next-Generation Transcriptome Data and the Vertebrate–Invertebrate Gap , 2013, PLoS genetics.

[15]  L. Excoffier,et al.  Statistical evaluation of alternative models of human evolution , 2007, Proceedings of the National Academy of Sciences.

[16]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[17]  F. Tajima The effect of change in population size on DNA polymorphism. , 1989, Genetics.

[18]  D. Balding,et al.  Statistical Applications in Genetics and Molecular Biology On Optimal Selection of Summary Statistics for Approximate Bayesian Computation , 2011 .

[19]  D. Charlesworth,et al.  Patterns of Polymorphism and Demographic History in Natural Populations of Arabidopsis lyrata , 2008, PloS one.

[20]  L. Duret,et al.  Comparative population genomics in animals uncovers the determinants of genetic diversity , 2014, Nature.

[21]  P. David,et al.  Habitat preference and the marine-speciation paradox , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[22]  P. Andolfatto Adaptive evolution of non-coding DNA in Drosophila , 2005, Nature.

[23]  B. Stranger,et al.  Multilocus Analysis of Variation and Speciation in the Closely Related Species Arabidopsis halleri and A. lyrata , 2004, Genetics.

[24]  N. Barton,et al.  The barrier to genetic exchange between hybridising populations , 1986, Heredity.

[25]  Chung-I Wu The genic view of the process of speciation , 2001 .

[26]  F. Bonhomme,et al.  The origin and remolding of genomic islands of differentiation in the European sea bass , 2017, Nature Communications.

[27]  Ryan D. Hernandez,et al.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data , 2009, PLoS genetics.

[28]  Jukka Corander,et al.  In defence of model‐based inference in phylogeography , 2010, Molecular ecology.

[29]  J. Welch,et al.  Coadapted genomes and selection on hybrids: Fisher's geometric model explains a variety of empirical patterns , 2017, bioRxiv.

[30]  V. Ranwez,et al.  Reference‐free transcriptome assembly in non‐model animals from next‐generation sequencing data , 2012, Molecular ecology resources.

[31]  L. Excoffier,et al.  Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood , 2009, Genetics.

[32]  S. Wright,et al.  Does Speciation between Arabidopsis halleri and Arabidopsis lyrata Coincide with Major Changes in a Molecular Target of Adaptation? , 2011, PloS one.

[33]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[34]  P. David,et al.  Introgression patterns in the mosaic hybrid zone between Mytilus edulis and M. galloprovincialis , 2003, Molecular ecology.

[35]  Flora Jay,et al.  Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach , 2016, bioRxiv.

[36]  D. Metzler,et al.  Jaatha: a fast composite‐likelihood approach to estimate demographic parameters , 2011, Molecular ecology.

[37]  V. Sousa,et al.  Understanding the origin of species with genome-scale data: modelling gene flow , 2013, Nature Reviews Genetics.

[38]  Sergio Lukić,et al.  Demographic Inference Using Spectral Methods on SNP Data, with an Analysis of the Human Out-of-Africa Expansion , 2012, Genetics.

[39]  Jun Wang,et al.  SNP Calling, Genotype Calling, and Sample Allele Frequency Estimation from New-Generation Sequencing Data , 2012, PloS one.

[40]  V. Sousa,et al.  Identifying Loci Under Selection Against Gene Flow in Isolation-with-Migration Models , 2013, Genetics.

[41]  N. Bierne,et al.  The Flow of Antimicrobial Peptide Genes Through a Genetic Barrier Between Mytilus edulis and M. galloprovincialis , 2009, Journal of Molecular Evolution.

[42]  J. Novembre,et al.  Characterizing bias in population genetic inferences from low-coverage sequencing data. , 2014, Molecular biology and evolution.

[43]  P. Palsbøll,et al.  Inferring past demographic changes from contemporary genetic data: A simulation‐based evaluation of the ABC methods implemented in diyabc , 2017, Molecular ecology resources.

[44]  G. A. Watterson On the number of segregating sites in genetical models without recombination. , 1975, Theoretical population biology.

[45]  J. Welch,et al.  Gene-Flow in a Mosaic Hybrid Zone: Is Local Introgression Adaptive? , 2014, Genetics.

[46]  W. Stephan,et al.  Estimating Parameters of Speciation Models Based on Refined Summaries of the Joint Site-Frequency Spectrum , 2011, PloS one.

[47]  J. Wakeley,et al.  Estimating ancestral population parameters. , 1997, Genetics.

[48]  M. Chapman,et al.  Genomic Divergence during Speciation Driven by Adaptation to Altitude , 2013, Molecular biology and evolution.

[49]  Lucie M. Gattepaille,et al.  Demographic inferences using short‐read genomic data in an approximate Bayesian computation framework: in silico evaluation of power, biases and proof of concept in Atlantic walrus , 2015, Molecular ecology.

[50]  L. Excoffier,et al.  Robust Demographic Inference from Genomic and SNP Data , 2013, PLoS genetics.

[51]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[52]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[53]  P. David,et al.  ASSORTATIVE FERTILIZATION AND SELECTION AT LARVAL STAGE IN THE MUSSELS MYTILUS EDULIS AND M. GALLOPROVINCIALIS , 2002, Evolution; international journal of organic evolution.

[54]  M. Przeworski,et al.  A new approach to estimate parameters of speciation models with application to apes. , 2007, Genome research.

[55]  L. Bernatchez,et al.  Modeling the Multiple Facets of Speciation-with-Gene-Flow toward Inferring the Divergence History of Lake Whitefish Species Pairs (Coregonus clupeaformis) , 2016, bioRxiv.

[56]  Nicolas Bierne,et al.  Crossing the species barrier: genomic hotspots of introgression between two highly divergent Ciona intestinalis species. , 2013, Molecular biology and evolution.

[57]  J. Hey,et al.  Estimating Divergence Parameters With Small Samples From a Large Number of Loci , 2010, Genetics.

[58]  Alexander T. Xue,et al.  The aggregate site frequency spectrum for comparative population genomic inference , 2015, Molecular ecology.

[59]  D. Skibinski,et al.  Aspects of the population genetics of Mytilus (Mytilidae; Mollusca) in the British Isles , 1983 .

[60]  N. Bierne,et al.  Adaptive evolution and segregating load contribute to the genomic landscape of divergence in two tree species connected by episodic gene flow , 2017, Molecular ecology.

[61]  H. Hoekstra,et al.  EVIDENCE OF ADAPTATION FROM ANCESTRAL VARIATION IN YOUNG POPULATIONS OF BEACH MICE , 2012, Evolution; international journal of organic evolution.

[62]  R. Reinhardt,et al.  European sea bass genome and its variation provide insights into adaptation to euryhalinity and speciation , 2014, Nature Communications.

[63]  Aaron P. Ragsdale,et al.  Genomic inferences of domestication events are corroborated by written records in Brassica rapa , 2017, bioRxiv.

[64]  N. Bierne,et al.  Can we continue to neglect genomic variation in introgression rates when inferring the history of speciation? A case study in a Mytilus hybrid zone , 2014, Journal of evolutionary biology.

[65]  M. Mezzavilla,et al.  Evidence for past and present hybridization in three Antarctic icefish species provides new perspectives on an evolutionary radiation , 2013, Molecular ecology.

[66]  J. Hearn,et al.  ABC inference of multi-population divergence with admixture from unphased population genomic data , 2014, Molecular ecology.

[67]  R. Nielsen,et al.  Distinguishing migration from isolation: a Markov chain Monte Carlo approach. , 2001, Genetics.

[68]  A. Futschik,et al.  A Novel Approach for Choosing Summary Statistics in Approximate Bayesian Computation , 2012, Genetics.