Detecting and Measuring Selection from Gene Frequency Data

The recent advent of high-throughput sequencing and genotyping technologies makes it possible to produce, easily and cost effectively, large amounts of detailed data on the genotype composition of populations. Detecting locus-specific effects may help identify those genes that have been, or are currently, targeted by natural selection. How best to identify these selected regions, loci, or single nucleotides remains a challenging issue. Here, we introduce a new model-based method, called SelEstim, to distinguish putative selected polymorphisms from the background of neutral (or nearly neutral) ones and to estimate the intensity of selection at the former. The underlying population genetic model is a diffusion approximation for the distribution of allele frequency in a population subdivided into a number of demes that exchange migrants. We use a Markov chain Monte Carlo algorithm for sampling from the joint posterior distribution of the model parameters, in a hierarchical Bayesian framework. We present evidence from stochastic simulations, which demonstrates the good power of SelEstim to identify loci targeted by selection and to estimate the strength of selection acting on these loci, within each deme. We also reanalyze a subset of SNP data from the Stanford HGDP–CEPH Human Genome Diversity Cell Line Panel to illustrate the performance of SelEstim on real data. In agreement with previous studies, our analyses point to a very strong signal of positive selection upstream of the LCT gene, which encodes for the enzyme lactase–phlorizin hydrolase and is associated with adult-type hypolactasia. The geographical distribution of the strength of positive selection across the Old World matches the interpolated map of lactase persistence phenotype frequencies, with the strongest selection coefficients in Europe and in the Indus Valley.

[1]  Kevin R. Thornton,et al.  A New Approach for Using Genome Scans to Detect Recent Positive Selection in the Human Genome , 2007, PLoS biology.

[2]  S. Wright Evolution in mendelian populations , 1931 .

[3]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[4]  Mark George Thomas,et al.  A worldwide correlation of lactase persistence phenotype and genotypes , 2010, BMC Evolutionary Biology.

[5]  Timothy B Sackton,et al.  A Scan for Positively Selected Genes in the Genomes of Humans and Chimpanzees , 2005, PLoS biology.

[6]  P. Taberlet,et al.  The power and promise of population genomics: from genotyping to genome typing , 2003, Nature Reviews Genetics.

[7]  D. Hartl,et al.  Directional selection and the site-frequency spectrum. , 2001, Genetics.

[8]  Adam Powell,et al.  The Origins of Lactase Persistence in Europe , 2009, PLoS Comput. Biol..

[9]  Jonathan K. Pritchard,et al.  Adaptations to Climate-Mediated Selective Pressures in Humans , 2011, PLoS genetics.

[10]  Pardis C Sabeti,et al.  Detecting recent positive selection in the human genome from haplotype structure , 2002, Nature.

[11]  Peter M Visscher,et al.  Recent human effective population size estimated from linkage disequilibrium. , 2007, Genome research.

[12]  F Rousset,et al.  Equilibrium values of measures of population subdivision for stepwise mutation processes. , 1996, Genetics.

[13]  T. Hocking,et al.  A Bayesian Outlier Criterion to Detect SNPs under Selection in Large Data Sets , 2010, PloS one.

[14]  Joseph K. Pickrell,et al.  Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data , 2012, PLoS genetics.

[15]  W. G. Hill,et al.  Measures of human population structure show heterogeneity among genomic regions. , 2005, Genome research.

[16]  L. Held,et al.  Bayesian Variable Selection for Detecting Adaptive Genomic Differences Among Populations , 2008, Genetics.

[17]  M. Beaumont,et al.  Likelihood-Free Inference of Population Structure and Local Adaptation in a Bayesian Hierarchical Model , 2010, Genetics.

[18]  P. Boursot,et al.  Interpretation of variation across marker loci as evidence of selection. , 2001, Genetics.

[19]  L. Quintana-Murci,et al.  Natural selection has driven population differentiation in modern humans , 2008, Nature Genetics.

[20]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[21]  Renaud Vitalis,et al.  Detecting correlation between allele frequencies and environmental variables as a signature of selection. A fast computational approach for genome-wide studies , 2014 .

[22]  P Joyce,et al.  Likelihoods and simulation methods for a class of nonneutral population genetics models. , 2001, Genetics.

[23]  G. Coop,et al.  THE SIGNATURE OF POSITIVE SELECTION ON STANDING GENETIC VARIATION , 2005, Evolution; international journal of organic evolution.

[24]  D. Balding,et al.  Identifying adaptive genetic divergence among populations from genome scans , 2004, Molecular ecology.

[25]  David Comas,et al.  Independent introduction of two lactase-persistence alleles into human populations reflects different history of adaptation to milk culture. , 2008, American journal of human genetics.

[26]  H. Innan,et al.  Pattern of polymorphism after strong artificial selection in a domestication event. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[27]  David B. Witonsky,et al.  Human adaptations to diet, subsistence, and ecoregion are due to subtle shifts in allele frequency , 2010, Proceedings of the National Academy of Sciences.

[28]  N. Barton,et al.  Adaptive landscapes, genetic distance and the evolution of quantitative characters. , 1987, Genetical research.

[29]  Fernando Pérez-Cruz,et al.  Kullback-Leibler divergence estimation of continuous distributions , 2008, 2008 IEEE International Symposium on Information Theory.

[30]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[31]  Z. Gompert,et al.  A Hierarchical Bayesian Model for Next-Generation Population Genomics , 2011, Genetics.

[32]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[33]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[34]  J. Wakeley Metapopulation models for historical inference , 2004, Molecular ecology.

[35]  Carlos Bustamante,et al.  Genomic scans for selective sweeps using SNP data. , 2005, Genome research.

[36]  D. Reich,et al.  Herders of Indian and European cattle share their predominant allele for lactase persistence. , 2012, Molecular biology and evolution.

[37]  O. Gaggiotti,et al.  Quantifying population structure using the F‐model , 2010, Molecular ecology resources.

[38]  C. Baer,et al.  Population genomics: genome-wide sampling of insect populations. , 2001, Annual review of entomology.

[39]  M. Gautier,et al.  Inferring population histories using genome-wide allele frequency data. , 2013, Molecular biology and evolution.

[40]  Renaud Vitalis,et al.  rehh: an R package to detect footprints of selection in genome-wide SNP data from haplotype structure , 2012, Bioinform..

[41]  Tom Fawcett,et al.  ROC graphs with instance-varying costs , 2006, Pattern Recognit. Lett..

[42]  P Donnelly,et al.  Coalescents and genealogical structure under neutrality. , 1995, Annual review of genetics.

[43]  S. Wright,et al.  Evolution and the Genetics of Populations: Volume 2, The Theory of Gene Frequencies , 1968 .

[44]  W. Gilks Markov Chain Monte Carlo , 2005 .

[45]  M. Nachman,et al.  Searching for evidence of positive selection in the human genome using patterns of microsatellite variability. , 2002, Molecular biology and evolution.

[46]  D. Goldstein,et al.  Human migrations and population structure: what we know and why it matters. , 2002, Annual review of genomics and human genetics.

[47]  Jonathan Scott Friedlaender,et al.  A Human Genome Diversity Cell Line Panel , 2002, Science.

[48]  M. Beaumont,et al.  Evaluating loci for use in the genetic analysis of population structure , 1996, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[49]  Guillaume Bouchard,et al.  Testing for Associations between Loci and Environmental Gradients Using Latent Factor Mixed Models , 2012, Molecular biology and evolution.

[50]  O. Gaggiotti,et al.  A Genome-Scan Method to Identify Selected Loci Appropriate for Both Dominant and Codominant Markers: A Bayesian Perspective , 2008, Genetics.

[51]  Andrew G. Clark,et al.  Darwinian and demographic forces affecting human protein coding genes. , 2009, Genome research.

[52]  B. Weir,et al.  Population Structure With Localized Haplotype Clusters , 2010, Genetics.

[53]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[54]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[55]  J. Wakeley,et al.  Nonequilibrium migration in human history. , 1999, Genetics.

[56]  D. Hartl,et al.  Maximum likelihood and Bayesian methods for estimating the distribution of selective effects among classes of mutations using DNA polymorphism data. , 2003, Theoretical population biology.

[57]  Pardis C Sabeti,et al.  Genetic signatures of strong recent positive selection at the lactase gene. , 2004, American journal of human genetics.

[58]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[59]  L. Keller,et al.  Assessing genetic structure with multiple classes of molecular markers: a case study involving the introduced fire ant Solenopsis invicta. , 1999, Molecular biology and evolution.

[60]  D. Balding Likelihood-based inference for genetic correlation coefficients. , 2003, Theoretical population biology.

[61]  A. Robertson Letters to the editors: Remarks on the Lewontin-Krakauer test. , 1975, Genetics.

[62]  M. Beaumont Adaptation and speciation: what can F(st) tell us? , 2005, Trends in ecology & evolution.

[63]  M. Nei,et al.  Lewontin-Krakauer test for neutral genes , 1975 .

[64]  J. Pritchard,et al.  A Map of Recent Positive Selection in the Human Genome , 2006, PLoS biology.

[65]  D. Hartl,et al.  Population genetics of polymorphism and divergence. , 1992, Genetics.

[66]  Adam Powell,et al.  Evolution of lactase persistence: an example of human niche construction , 2011, Philosophical Transactions of the Royal Society B: Biological Sciences.

[67]  H. Jeffreys,et al.  Theory of probability , 1896 .

[68]  M. Nordborg Structured coalescent processes on different time scales. , 1997, Genetics.

[69]  D. Bates,et al.  Output Analysis and Diagnostics for MCMC , 2015 .

[70]  W. Stephan,et al.  Detecting a local signature of genetic hitchhiking along a recombining chromosome. , 2002, Genetics.

[71]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[72]  G. Coop,et al.  Robust Identification of Local Adaptation from Allele Frequencies , 2012, Genetics.

[73]  Mathias Currat,et al.  Impact of Selection and Demography on the Diffusion of Lactase Persistence , 2009, PloS one.

[74]  L. Excoffier,et al.  Detecting loci under selection in a hierarchically structured population , 2009, Heredity.

[75]  A. Robertson Remarks on the Lewontin-Krakauer test , 1975 .

[76]  D. Dey,et al.  Bayesian analysis of outlier problems using divergence measures , 1995 .

[77]  Pardis C Sabeti,et al.  Positive Natural Selection in the Human Lineage , 2006, Science.

[78]  B S Weir,et al.  Estimating F-statistics. , 2002, Annual review of genetics.

[79]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[80]  M. Shriver,et al.  Interrogating a high-density SNP map for signatures of natural selection. , 2002, Genome research.

[81]  R. McCulloch Local Model Influence , 1989 .

[82]  R. Nielsen Human genomics: Disclosure of variation , 2005, Nature.

[83]  Holly M. Mortensen,et al.  Convergent adaptation of human lactase persistence in Africa and Europe , 2007, Nature Genetics.

[84]  A. Caballero,et al.  Comparing three different methods to detect selective loci using dominant markers , 2010, Journal of evolutionary biology.

[85]  Dipak K. Dey,et al.  A Bayesian Hierarchical Model for Analysis of Single-Nucleotide Polymorphisms Diversity in Multilocus, Multipopulation Samples , 2009 .

[86]  R. Lewontin,et al.  Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. , 1973, Genetics.

[87]  Ryan D. Hernandez,et al.  Simultaneous inference of selection and population growth from patterns of variation in the human genome , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[88]  Richard A. Nichols,et al.  A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity , 2008, Genetica.

[89]  D. Petry,et al.  The effect on neutral gene flow of selection at a linked locus. , 1983, Theoretical population biology.

[90]  Mathieu Gautier,et al.  A whole genome Bayesian scan for adaptive genetic divergence in West African cattle , 2009, BMC Genomics.

[91]  David B. Witonsky,et al.  Using Environmental Correlations to Identify Loci Underlying Local Adaptation , 2010, Genetics.

[92]  C. Robert,et al.  Bayesian Modeling Using WinBUGS , 2009 .

[93]  Geoffrey B. Nilsen,et al.  Whole-Genome Patterns of Common DNA Variation in Three Human Populations , 2005, Science.

[94]  L. Excoffier,et al.  Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. , 1992, Genetics.

[95]  K. Aoki,et al.  A stochastic model of gene-culture coevolution suggested by the "culture historical hypothesis" for the evolution of adult lactose absorption in humans. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[96]  R. Nielsen Statistical tests of selective neutrality in the age of genomics , 2001, Heredity.

[97]  Peter Donnelly,et al.  Assessing population differentiation and isolation from single‐nucleotide polymorphism data , 2002 .

[98]  Bertrand Servin,et al.  Detecting Signatures of Selection Through Haplotype Differentiation Among Hierarchically Structured Populations , 2012, Genetics.

[99]  Joseph K. Pickrell,et al.  Signals of recent positive selection in a worldwide sample of human populations. , 2009, Genome research.

[100]  C. Chevalet,et al.  Detecting Selection in Population Trees: The Lewontin and Krakauer Test Extended , 2010, Genetics.

[101]  N. Barton,et al.  The frequency of shifts between alternative equilibria. , 1987, Journal of theoretical biology.