An ABC method for whole-genome sequence data: inferring paleolithic and neolithic human expansions

Species generally undergo a complex demographic history, consisting, in particular, of multiple changes in population size. Genome-wide sequencing data are potentially highly informative for reconstructing this demographic history. A crucial point is to extract the relevant information from these very large datasets. Here we designed an approach for inferring past demographic events from a moderate number of fully sequenced genomes. Our new approach uses Approximate Bayesian Computation (ABC), a simulation-based statistical framework that allows (i) identifying the best demographic scenario among several competing scenarios, and (ii) estimating the best-fitting parameters under the chosen scenario. ABC relies on the computation of summary statistics. Using a cross-validation approach, we showed that statistics such as the lengths of haplotypes shared between individuals, or the decay of linkage disequilibrium with distance, can be combined with classical statistics (eg heterozygosity, Tajima’s D) to accurately infer complex demographic scenarios including bottlenecks and expansion periods. We also demonstrated the importance of simultaneously estimating the genotyping error rate. Applying our method on genome-wide human-sequence databases, we finally showed that a model consisting in a bottleneck followed by a Paleolithic and a Neolithic expansion was the most relevant for Eurasian populations.

[1]  M. Blum Approximate Bayesian Computation: A Nonparametric Perspective , 2009, 0904.0635.

[2]  O. François,et al.  Approximate Bayesian Computation (ABC) in practice. , 2010, Trends in ecology & evolution.

[3]  L. Berthiaume,et al.  Wnt acylation: seeing is believing. , 2014, Nature chemical biology.

[4]  Mark A. Beaumont,et al.  Joint determination of topology, divergence time, and immigration in population trees , 2008 .

[5]  R. Durbin,et al.  Revising the human mutation rate: implications for understanding human evolution , 2012, Nature Reviews Genetics.

[6]  Anand Bhaskar,et al.  DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA. , 2013, Annals of statistics.

[7]  Matthew D. Rasmussen,et al.  Genome-Wide Inference of Ancestral Recombination Graphs , 2013, PLoS genetics.

[8]  Laure Ségurel,et al.  Microsatellite data show recent demographic expansions in sedentary but not in nomadic human populations in Africa and Eurasia , 2014, European Journal of Human Genetics.

[9]  M. Jakobsson,et al.  Resequencing data provide no evidence for a human bottleneck in Africa during the penultimate glacial period. , 2012, Molecular biology and evolution.

[10]  A. Lambert,et al.  Accuracy of Demographic Inferences from the Site Frequency Spectrum: The Case of the Yoruba Population , 2017, Genetics.

[11]  Mattias Jakobsson,et al.  Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation , 2012, BMC Genetics.

[12]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[13]  Anand Bhaskar,et al.  Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data , 2014, bioRxiv.

[14]  Jean-Marie Cornuet,et al.  Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation , 2008, Bioinform..

[15]  Kenneth Offit,et al.  Sequencing an Ashkenazi reference panel supports population-targeted personal genomics and illuminates Jewish and European origins , 2014, Nature Communications.

[16]  Ryan D. Hernandez,et al.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data , 2009, PLoS genetics.

[17]  Olivier Delaneau,et al.  Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel , 2014, Nature Communications.

[18]  I. Pe’er,et al.  Length distributions of identity by descent reveal fine-scale demographic history. , 2012, American journal of human genetics.

[19]  Jean-Marie Cornuet,et al.  ABC model choice via random forests , 2014, 1406.6288.

[20]  Lucie M. Gattepaille,et al.  Inferring population size changes with sequence and SNP data: lessons from human bottlenecks , 2013, Heredity.

[21]  Lucie M. Gattepaille,et al.  Demographic inferences using short‐read genomic data in an approximate Bayesian computation framework: in silico evaluation of power, biases and proof of concept in Atlantic walrus , 2015, Molecular ecology.

[22]  L. Excoffier,et al.  Robust Demographic Inference from Genomic and SNP Data , 2013, PLoS genetics.

[23]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[24]  Yun-Xin Fu,et al.  Exploring Population Size Changes Using SNP Frequency Spectra , 2015, Nature Genetics.

[25]  R. Durbin,et al.  Inference of human population history from individual whole-genome sequences. , 2011, Nature.

[26]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[27]  O. Mazet,et al.  On the importance of being structured: instantaneous coalescence rates and human evolution—lessons for ancestral population size inference? , 2015, Heredity.

[28]  Michael Lachmann,et al.  Inferring the history of population size change from genome-wide SNP data. , 2012, Molecular biology and evolution.

[29]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[30]  L. Excoffier,et al.  Why hunter-gatherer populations do not show signs of pleistocene demographic expansions. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[31]  N. Risch,et al.  Estimating genotype error rates from high-coverage next-generation sequence data , 2014, Genome research.

[32]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[33]  M. Gutmann,et al.  Approximate Bayesian Computation , 2019, Annual Review of Statistics and Its Application.

[34]  August E. Woerner,et al.  Autosomal Resequence Data Reveal Late Stone Age Signals of Population Expansion in Sub-Saharan African Foraging and Farming Populations , 2009, PloS one.

[35]  Yun S. Song,et al.  Deep Learning for Population Genetic Inference , 2015, bioRxiv.

[36]  Joshua M. Akey,et al.  Methods and models for unravelling human evolutionary history , 2015, Nature Reviews Genetics.

[37]  Olivier François,et al.  Non-linear regression models for Approximate Bayesian Computation , 2008, Stat. Comput..

[38]  M. Stoneking,et al.  Demographic History of Oceania Inferred from Genome-wide Data , 2010, Current Biology.

[39]  Brian L Browning,et al.  Accurate Non-parametric Estimation of Recent Effective Population Size from Segments of Identity by Descent. , 2015, American journal of human genetics.

[40]  Yi-Juan Hu,et al.  PhredEM: a phred‐score‐informed genotype‐calling approach for next‐generation sequencing studies , 2017, Genetic epidemiology.

[41]  Z. Rónai,et al.  Real-time PCR quantification of human complement C4A and C4B genes , 2006, BMC Genetics.

[42]  E. Heyer,et al.  Statistical inference on genetic data reveals the complex demographic history of human populations in central Asia. , 2015, Molecular biology and evolution.

[43]  Paul Marjoram,et al.  Fast "coalescent" simulation , 2006, BMC Genetics.

[44]  Robert B. Hartlage,et al.  This PDF file includes: Materials and Methods , 2009 .

[45]  K. Veeramah,et al.  The impact of whole-genome sequencing on the reconstruction of human population history , 2014, Nature Reviews Genetics.

[46]  F. Austerlitz,et al.  Different kinds of genetic markers permit inference of Paleolithic and Neolithic expansions in humans , 2016, European Journal of Human Genetics.

[47]  G. McVean,et al.  Approximating the coalescent with recombination , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[48]  Annabel C Beichman,et al.  Comparison of Single Genome and Allele Frequency Data Reveals Discordant Demographic Histories , 2017, G3: Genes, Genomes, Genetics.

[49]  Laure Ségurel,et al.  Human genetic data reveal contrasting demographic patterns between sedentary and nomadic populations that predate the emergence of farming. , 2013, Molecular biology and evolution.

[50]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[51]  R. Durbin,et al.  Inferring human population size and separation history from multiple genome sequences , 2014, Nature Genetics.

[52]  S. Sisson,et al.  A comparative review of dimension reduction methods in approximate Bayesian computation , 2012, 1202.3819.

[53]  R. Nielsen,et al.  Inferring Demographic History from a Spectrum of Shared Haplotype Lengths , 2013, PLoS genetics.

[54]  Jean-Marie Hombert,et al.  Inferring the Demographic History of African Farmers and Pygmy Hunter–Gatherers Using a Multilocus Resequencing Data Set , 2009, PLoS genetics.

[55]  Yun S. Song,et al.  Estimating Variable Effective Population Sizes from Multiple Genomes: A Sequentially Markov Conditional Sampling Distribution Approach , 2013, Genetics.

[56]  Molly Przeworski,et al.  Determinants of mutation rate variation in the human germline. , 2014, Annual review of genomics and human genetics.

[57]  Michael A. Schmidt,et al.  SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies , 2010, Bioinform..

[58]  Carina M. Schlebusch,et al.  Tales of Human Migration, Admixture, and Selection in Africa. , 2018, Annual review of genomics and human genetics.

[59]  F. Jay,et al.  Inferring population size history from large samples of genome wide molecular data - an approximate Bayesian computation approach , 2016, bioRxiv.

[60]  M. Goddard,et al.  Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors , 2013, Molecular biology and evolution.

[61]  G. Perry,et al.  The impact of agricultural emergence on the genetic history of African rainforest hunter-gatherers and agriculturalists , 2014, Nature Communications.

[62]  Yun S. Song,et al.  Robust and scalable inference of population history from hundreds of unphased whole genomes , 2016, Nature Genetics.

[63]  O. Mazet,et al.  The IICR (inverse instantaneous coalescence rate) as a summary of genomic diversity: insights into demographic inference and model choice , 2017, Heredity.

[64]  Xiaofeng Zhu,et al.  The landscape of recombination in African Americans , 2011, Nature.

[65]  M. Fontaine,et al.  History of expansion and anthropogenic collapse in a top marine predator of the Black Sea estimated from genetic data , 2012, Proceedings of the National Academy of Sciences.

[66]  V. Macaulay,et al.  The Expansion of mtDNA Haplogroup L3 within and out of Africa. , 2012, Molecular biology and evolution.

[67]  D. Gudbjartsson,et al.  A high-resolution recombination map of the human genome , 2002, Nature Genetics.

[68]  O. Delaneau,et al.  Supplementary Information for ‘ Improved whole chromosome phasing for disease and population genetic studies ’ , 2012 .

[69]  R. Gibbs,et al.  Neutral genomic regions refine models of recent rapid human population growth , 2013, Proceedings of the National Academy of Sciences.

[70]  H. Oestreicher Supplementary Notes to , 1958 .

[71]  Jean-Marie Cornuet,et al.  Bayesian Analysis of an Admixture Model With Mutations and Arbitrarily Linked Markers , 2005, Genetics.

[72]  D. Behar,et al.  Insights into the demographic history of African Pygmies from complete mitochondrial genomes. , 2011, Molecular biology and evolution.

[73]  Katalin Csill'ery,et al.  abc: an R package for approximate Bayesian computation (ABC) , 2011, 1106.2793.