Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferences

Disentangling the effect on genomic diversity of natural selection from that of demography is notoriously difficult, but necessary to properly reconstruct the history of species. Here, we use high-quality human genomic data to show that purifying selection at linked sites (i.e. background selection, BGS) and GC-biased gene conversion (gBGC) together affect as much as 95% of the variants of our genome. We find that the magnitude and relative importance of BGS and gBGC are largely determined by variation in recombination rate and base composition. Importantly, synonymous sites and non-transcribed regions are also affected, albeit to different degrees. Their use for demographic inference can lead to strong biases. However, by conditioning on genomic regions with recombination rates above 1.5 cM/Mb and mutation types (C↔G, A↔T), we identify a set of SNPs that is mostly unaffected by BGS or gBGC, and that avoids these biases in the reconstruction of human history.

[1]  Morris Swertz,et al.  Genome-wide patterns and properties of de novo mutations in humans , 2015, Nature Genetics.

[2]  David Haussler,et al.  Ongoing GC-Biased Evolution Is Widespread in the Human Genome and Enriched Near Recombination Hot Spots , 2011, Genome biology and evolution.

[3]  B. Charlesworth The effect of background selection against deleterious mutations on weakly selected, linked variants. , 1994, Genetical research.

[4]  Josep M. Comeron,et al.  Background Selection as Baseline for Nucleotide Variation across the Drosophila Genome , 2014, bioRxiv.

[5]  J. Wakeley,et al.  Empirical Bayes Estimation of Coalescence Times from Nucleotide Sequence Data , 2016, Genetics.

[6]  Gene W. Yeo,et al.  A Large-Scale Binding and Functional Map of Human RNA Binding Proteins , 2017, bioRxiv.

[7]  Anders Albrechtsen,et al.  Natural Selection Affects Multiple Aspects of Genetic Variation at Putatively Neutral Sites across the Human Genome , 2011, PLoS genetics.

[8]  Morgan C. Giddings,et al.  Defining functional DNA elements in the human genome , 2014, Proceedings of the National Academy of Sciences.

[9]  B. Charlesworth,et al.  The pattern of neutral molecular variation under the background selection model. , 1995, Genetics.

[10]  Brian T. Lee,et al.  The UCSC Genome Browser database: 2015 update , 2014, Nucleic Acids Res..

[11]  B. Charlesworth,et al.  The Joint Effects of Background Selection and Genetic Recombination on Local Gene Genealogies , 2011, Genetics.

[12]  Philipp W. Messer,et al.  Quantification of GC-biased gene conversion in the human genome , 2014, bioRxiv.

[13]  Paula Tataru,et al.  Inference of Distribution of Fitness Effects and Proportion of Adaptive Substitutions from Polymorphism Data , 2017, Genetics.

[14]  Marylyn D. Ritchie,et al.  Visualizing genomic information across chromosomes with PhenoGram , 2013, BioData Mining.

[15]  Laurent Duret,et al.  Detecting positive selection within genomes: the problem of biased gene conversion , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[16]  Timothy B Sackton,et al.  Natural Selection Constrains Neutral Diversity across A Wide Range of Species , 2014, bioRxiv.

[17]  N L Kaplan,et al.  Deleterious background selection with recombination. , 1995, Genetics.

[18]  K. Veeramah,et al.  Population genomic analysis of elongated skulls reveals extensive female-biased immigration in Early Medieval Bavaria , 2018, Proceedings of the National Academy of Sciences.

[19]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[20]  S. Glémin,et al.  Inference of Distribution of Fitness Effects and Proportion of Adaptive Substitutions from Polymorphism Data , 2016, Genetics.

[21]  Tina T. Hu,et al.  A Genomic Map of the Effects of Linked Selection in Drosophila , 2014, PLoS genetics.

[22]  John Novembre,et al.  The influence of genomic context on mutation patterns in the human genome inferred from rare variants , 2013, Genome research.

[23]  J. Jensen,et al.  The consequences of not accounting for background selection in demographic inference , 2016, Molecular ecology.

[24]  L. Excoffier,et al.  Robust Demographic Inference from Genomic and SNP Data , 2013, PLoS genetics.

[25]  Bradley P. Coe,et al.  Global diversity, population stratification, and selection of human copy-number variation , 2015, Science.

[26]  A. Gylfason,et al.  Fine-scale recombination rate differences between sexes, populations and individuals , 2010, Nature.

[27]  Jianzhi Zhang,et al.  Genomic evidence for elevated mutation rates in highly expressed genes , 2012, EMBO reports.

[28]  A. Lambert,et al.  Accuracy of Demographic Inferences from the Site Frequency Spectrum: The Case of the Yoruba Population , 2017, Genetics.

[29]  D. Graur An Upper Limit on the Functional Fraction of the Human Genome , 2017, Genome biology and evolution.

[30]  Kirk E. Lohmueller,et al.  Inference of the Distribution of Selection Coefficients for New Nonsynonymous Mutations Using Large Samples , 2016, Genetics.

[31]  Ran Elkon,et al.  Characterization of noncoding regulatory DNA in the human genome , 2017, Nature Biotechnology.

[32]  Brian Charlesworth,et al.  The Effects of Deleterious Mutations on Evolution at Linked Sites , 2012, Genetics.

[33]  S. Tishkoff,et al.  Biased gene conversion skews allele frequencies in human populations, increasing the disease burden of recessive alleles. , 2014, American journal of human genetics.

[34]  Ingo Ruczinski,et al.  Recombination rates in admixed individuals identified by ancestry-based inference , 2011, Nature Genetics.

[35]  Yang I Li,et al.  An Expanded View of Complex Traits: From Polygenic to Omnigenic , 2017, Cell.

[36]  Giacomo Cavalli,et al.  Organization and function of the 3D genome , 2016, Nature Reviews Genetics.

[37]  Justin C. Fay,et al.  Hitchhiking under positive Darwinian selection. , 2000, Genetics.

[38]  Philipp W. Messer,et al.  SLiM 2: Flexible, Interactive Forward Genetic Simulations , 2017, Molecular biology and evolution.

[39]  August E. Woerner,et al.  Inference of Gorilla Demographic and Selective History from Whole-Genome Sequence Data , 2015, Molecular biology and evolution.

[40]  Pavlos Pavlidis,et al.  A survey of methods and tools to detect recent and strong positive selection , 2017, Journal of Biological Research-Thessaloniki.

[41]  R. Durbin,et al.  Inferring human population size and separation history from multiple genome sequences , 2014, Nature Genetics.

[42]  R. Nielsen,et al.  Detecting recent selective sweeps while controlling for mutation rate and background selection , 2015, bioRxiv.

[43]  Yun S. Song,et al.  Estimating Variable Effective Population Sizes from Multiple Genomes: A Sequentially Markov Conditional Sampling Distribution Approach , 2013, Genetics.

[44]  P. Green,et al.  Widespread Genomic Signatures of Natural Selection in Hominid Evolution , 2009, PLoS genetics.

[45]  Ryan D. Hernandez,et al.  Context-dependent mutation rates may cause spurious signatures of a fixation bias favoring higher GC-content in humans. , 2007, Molecular biology and evolution.

[46]  Paz Polak,et al.  Differential relationship of DNA replication timing to different forms of human mutation and variation. , 2012, American journal of human genetics.

[47]  Chris M Rands,et al.  8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage , 2014, PLoS genetics.

[48]  Laurent Duret,et al.  Biased gene conversion and the evolution of mammalian genomic landscapes. , 2009, Annual review of genomics and human genetics.

[49]  Michael M. Desai,et al.  Distortions in genealogies due to purifying selection. , 2012, Molecular biology and evolution.

[50]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[51]  B. Charlesworth,et al.  The effect of recombination on background selection. , 1996, Genetical research.

[52]  A. Betancourt,et al.  Crossovers are associated with mutation and biased gene conversion at recombination hotspots , 2015, Proceedings of the National Academy of Sciences.

[53]  P. Keightley,et al.  What can we learn about the distribution of fitness effects of new mutations from DNA sequence data? , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[54]  William Stafford Noble,et al.  Widely distributed noncoding purifying selection in the human genome , 2007, Proceedings of the National Academy of Sciences.

[55]  J. Stamatoyannopoulos,et al.  Human mutation rate associated with DNA replication timing , 2009, Nature Genetics.

[56]  S. Gabriel,et al.  Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants , 2012, Nature.

[57]  A. Siepel,et al.  Bayesian inference of ancient human demography from individual genome sequences , 2011, Nature Genetics.

[58]  Michael M. Desai,et al.  Distortions in Genealogies due to Purifying Selection and Recombination , 2013, Genetics.

[59]  W. Stephan Genetic hitchhiking versus background selection: the controversy and its implications , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[60]  P. Brown,et al.  Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[61]  Carlos Bustamante,et al.  Genomic scans for selective sweeps using SNP data. , 2005, Genome research.

[62]  Guy Sella,et al.  Pervasive Hitchhiking at Coding and Regulatory Sites in Humans , 2009, PLoS genetics.

[63]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[64]  Yun S. Song,et al.  The Simons Genome Diversity Project: 300 genomes from 142 diverse populations , 2016, Nature.

[65]  B. Gaut,et al.  Deleterious variants in Asian rice and the potential cost of domestication , 2016, bioRxiv.

[66]  Peter Donnelly,et al.  The Influence of Recombination on Human Genetic Diversity , 2006, PLoS genetics.

[67]  H. Akaike A new look at the statistical model identification , 1974 .

[68]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[69]  A. Kern,et al.  The Neutral Theory in Light of Natural Selection. , 2018, Molecular biology and evolution.

[70]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[71]  M. Nachman,et al.  Single nucleotide polymorphisms and recombination rate in humans. , 2001, Trends in genetics : TIG.

[72]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[73]  Ryan D. Hernandez,et al.  Classic Selective Sweeps Were Rare in Recent Human Evolution , 2011, Science.

[74]  R. Faria,et al.  Interpreting the genomic landscape of speciation: a road map for finding barriers to gene flow , 2017, Journal of evolutionary biology.

[75]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[76]  J. Akey,et al.  Fitting background-selection predictions to levels of nucleotide variation and divergence along the human autosomes. , 2005, Genome research.

[77]  L. Duret,et al.  Adaptation or biased gene conversion? Extending the null hypothesis of molecular evolution. , 2007, Trends in genetics : TIG.

[78]  S. Pääbo,et al.  A neutral explanation for the correlation of diversity with recombination rates in humans. , 2003, American journal of human genetics.