Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach

Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles.

[1]  A. Rogers How Population Growth Affects Linkage Disequilibrium , 2013, Genetics.

[2]  G. McVean,et al.  Approximating the coalescent with recombination , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[3]  A. C. Sørensen,et al.  Inbreeding in Danish dairy cattle breeds. , 2005, Journal of dairy science.

[4]  Ofer Zeitouni,et al.  Lectures on probability theory and statistics , 2004 .

[5]  O. Pybus,et al.  An integrated framework for the inference of viral population history from reconstructed genealogies. , 2000, Genetics.

[6]  M. Shriver,et al.  Interrogating a high-density SNP map for signatures of natural selection. , 2002, Genome research.

[7]  Lounès Chikhi,et al.  Demographic inference using genetic data from a single individual: separating population size variation from population structure , 2014 .

[8]  Laurent Excoffier,et al.  Distinguishing between population bottleneck and population subdivision by a Bayesian model choice procedure , 2010, Molecular ecology.

[9]  R. Nielsen,et al.  Inferring Demographic History from a Spectrum of Shared Haplotype Lengths , 2013, PLoS genetics.

[10]  August E. Woerner,et al.  Examining Phylogenetic Relationships Among Gibbon Genera Using Whole Genome Sequence Data Using an Approximate Bayesian Computation Approach , 2015, Genetics.

[11]  Lucie M. Gattepaille,et al.  Inferring population size changes with sequence and SNP data: lessons from human bottlenecks , 2013, Heredity.

[12]  Chad Huff,et al.  Linkage Disequilibrium Between Loci With Unknown Phase , 2009, Genetics.

[13]  Jean-Marie Cornuet,et al.  Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation , 2008, Bioinform..

[14]  O. Mazet,et al.  On the importance of being structured: instantaneous coalescence rates and human evolution—lessons for ancestral population size inference? , 2015, Heredity.

[15]  Meganathan P. Ramakodi,et al.  Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs , 2014, Science.

[16]  Yun-Xin Fu,et al.  Exploring Population Size Changes Using SNP Frequency Spectra , 2015, Nature Genetics.

[17]  J. Woolliams,et al.  Estimation of historical effective population size using linkage disequilibria with marker data. , 2012, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[18]  Tom Druet,et al.  Genetic Variants in REC8, RNF212, and PRDM9 Influence Male Recombination in Cattle , 2012, PLoS genetics.

[19]  R. Durbin,et al.  Inferring human population size and separation history from multiple genome sequences , 2014, Nature Genetics.

[20]  Ryan D. Hernandez,et al.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data , 2009, PLoS genetics.

[21]  D. Goldstein,et al.  Human migrations and population structure: what we know and why it matters. , 2002, Annual review of genomics and human genetics.

[22]  S. Ho,et al.  Skyline‐plot methods for estimating demographic history from nucleotide sequences , 2011, Molecular ecology resources.

[23]  P. Visscher,et al.  Human population dispersal "Out of Africa" estimated from linkage disequilibrium and allele frequencies of SNPs. , 2011, Genome research.

[24]  R. Veerkamp,et al.  Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle , 2014, Nature Genetics.

[25]  Robert D Schnabel,et al.  Genome-Wide Survey of SNP Variation Uncovers the Genetic Structure of Cattle Breeds , 2009, Science.

[26]  Huanming Yang,et al.  Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation , 2012, Nature Genetics.

[27]  Jun Wang,et al.  SNP Calling, Genotype Calling, and Sample Allele Frequency Estimation from New-Generation Sequencing Data , 2012, PloS one.

[28]  Mazet Olivier,et al.  Demographic inference using genetic data from a single individual: separating population size variation from population structure , 2014, bioRxiv.

[29]  M. Zeder,et al.  Domestication and early agriculture in the Mediterranean Basin: Origins, diffusion, and impact , 2008, Proceedings of the National Academy of Sciences.

[30]  L. Excoffier,et al.  Robust Demographic Inference from Genomic and SNP Data , 2013, PLoS genetics.

[31]  Michael Lachmann,et al.  Inferring the history of population size change from genome-wide SNP data. , 2012, Molecular biology and evolution.

[32]  M. Stoneking,et al.  Demographic History of Oceania Inferred from Genome-wide Data , 2010, Current Biology.

[33]  L. Chikhi,et al.  Genetic data suggest a natural prehuman origin of open habitats in northern Madagascar and question the deforestation narrative in this region , 2012, Proceedings of the National Academy of Sciences.

[34]  Sergio Lukić,et al.  Demographic Inference Using Spectral Methods on SNP Data, with an Analysis of the Human Out-of-Africa Expansion , 2012, Genetics.

[35]  Yun S. Song,et al.  Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data , 2014, bioRxiv.

[36]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[37]  C. Chevalet,et al.  Detecting past changes of effective population size , 2014, Evolutionary applications.

[38]  D. Boichard,et al.  High-density marker imputation accuracy in sixteen French cattle breeds , 2013, Genetics Selection Evolution.

[39]  S. Y. W. H. Palaeoclimate,et al.  Species-specific Responses of Late Quaternary Megafauna to Climate and Humans Nih Public Access , 2022 .

[40]  S. Sisson,et al.  A comparative review of dimension reduction methods in approximate Bayesian computation , 2012, 1202.3819.

[41]  William G. Hill,et al.  Estimation of effective population size from data on linkage disequilibrium , 1981 .

[42]  Anand Bhaskar,et al.  DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA. , 2013, Annals of statistics.

[43]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[44]  Yun S. Song,et al.  Estimating Variable Effective Population Sizes from Multiple Genomes: A Sequentially Markov Conditional Sampling Distribution Approach , 2013, Genetics.

[45]  Pall I. Olason,et al.  Demographic Divergence History of Pied Flycatcher and Collared Flycatcher Inferred from Whole-Genome Re-sequencing Data , 2013, PLoS genetics.

[46]  Wei-Chung Liu,et al.  Drastic population fluctuations explain the rapid extinction of the passenger pigeon , 2014, Proceedings of the National Academy of Sciences.

[47]  G. Leroy,et al.  Methods to estimate effective population size using pedigree data: Examples in dog, sheep, cattle and horse , 2013, Genetics Selection Evolution.

[48]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[49]  O. François,et al.  Approximate Bayesian Computation (ABC) in practice. , 2010, Trends in ecology & evolution.

[50]  P. Visscher,et al.  Novel multilocus measure of linkage disequilibrium to estimate past effective population size. , 2003, Genome research.

[51]  Ross M. Fraser,et al.  A General Approach for Haplotype Phasing across the Full Spectrum of Relatedness , 2014, PLoS genetics.

[52]  Bronwen L. Aken,et al.  Analyses of pig genomes provide insight into porcine demography and evolution , 2012, Nature.

[53]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[54]  M. Goddard,et al.  Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors , 2013, Molecular biology and evolution.

[55]  Bertrand Servin,et al.  Genome-Wide Analysis of the World's Sheep Breeds Reveals High Levels of Historic Mixture and Strong Recent Selection , 2012, PLoS biology.

[56]  B. Goossens,et al.  The Confounding Effects of Population Structure, Genetic Diversity and the Sampling Scheme on the Detection and Quantification of Population Size Changes , 2010, Genetics.

[57]  T. Sonstegard,et al.  Effective population size of an indigenous Swiss cattle breed estimated from linkage disequilibrium. , 2010, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[58]  J. Lenstra,et al.  On the Breeds of Cattle—Historic and Current Classifications , 2011 .

[59]  Mathieu Gautier,et al.  Insights into the Genetic History of French Cattle from Dense SNP Data on 47 Worldwide Breeds , 2010, PloS one.

[60]  Heebal Kim,et al.  Accurate Estimation of Effective Population Size in the Korean Dairy Cattle Based on Linkage Disequilibrium Corrected by Genomic Relationship Matrix , 2013, Asian-Australasian journal of animal sciences.

[61]  R. Durbin,et al.  Inference of human population history from individual whole-genome sequences. , 2011, Nature.

[62]  D. Boichard,et al.  Analyse généalogique des races bovines laitières françaises , 2016 .

[63]  M. Navascués,et al.  Recent population decline and selection shape diversity of taxol‐related genes , 2012, Molecular ecology.

[64]  Rasmus Heller,et al.  The Confounding Effect of Population Structure on Bayesian Skyline Plot Inferences of Demographic History , 2013, PloS one.

[65]  Laurent A. F. Frantz,et al.  Inferring Bottlenecks from Genome-Wide Samples of Short Sequence Blocks , 2015, Genetics.

[66]  Olivier François,et al.  Non-linear regression models for Approximate Bayesian Computation , 2008, Stat. Comput..

[67]  Katalin Csill'ery,et al.  abc: an R package for approximate Bayesian computation (ABC) , 2011, 1106.2793.

[68]  G. Perry,et al.  The impact of agricultural emergence on the genetic history of African rainforest hunter-gatherers and agriculturalists , 2014, Nature Communications.

[69]  M. Lynch,et al.  Genome-Wide Estimation of Linkage Disequilibrium from Population-Level High-Throughput Sequencing Data , 2014, Genetics.

[70]  J. Lenstra,et al.  Dual Origins of Dairy Cattle Farming – Evidence from a Comprehensive Survey of European Y-Chromosomal Variation , 2011, PloS one.

[71]  Mattias Jakobsson,et al.  Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation , 2012, BMC Genetics.

[72]  D. S. Buchanan,et al.  On the history of cattle genetic resources , 2014 .

[73]  Paul Marjoram,et al.  Fast "coalescent" simulation , 2006, BMC Genetics.