Detecting loci under selection in a hierarchically structured population

Patterns of genetic diversity between populations are often used to detect loci under selection in genome scans. Indeed, loci involved in local adaptations should show high FST values, whereas loci under balancing selection should rather show low FST values. Most tests of selection based on FST use a null distribution generated under a simple island model of population differentiation. Although this model has been shown to be robust, many species have a more complex genetic structure, with some populations sharing a recent ancestry or due to the presence of barriers to gene flow between different parts of a species range. In this paper, we propose the use of a hierarchical island model, in which demes exchange more migrants within groups than between groups, to generate the joint distribution of genetic diversity within and between populations. We show that tests not accounting for a hierarchical structure, when it exists, do generate a large excess of false positive loci, whereas the hierarchical island model is robust to uncertainties about the exact number of groups and demes per group in the system. Our approach also explicitly takes into account the mutational process, and does not just rely on allele frequencies, which is important for short tandem repeat (STR) data. An application to human and stickleback STR data sets reveals a much lower number of significant loci than previously obtained under a non-hierarchical model. The elimination of false positive loci from genome scans should allow us to better determine on which specific class of genes selection is operating.

[1]  C. Krimbas,et al.  Testing the heterogeneity of F values: a suggestion and a correction. , 1976, Genetics.

[2]  M. Beaumont Selection and sticklebacks , 2008, Molecular ecology.

[3]  Laurent Excoffier,et al.  Arlequin (version 3.0): An integrated software package for population genetics data analysis , 2005, Evolutionary bioinformatics online.

[4]  M. Notohara,et al.  The coalescent and the genealogical process in geographically structured population , 1990, Journal of mathematical biology.

[5]  D. Balding,et al.  Identifying adaptive genetic divergence among populations from genome scans , 2004, Molecular ecology.

[6]  J. Pritchard,et al.  A Map of Recent Positive Selection in the Human Genome , 2006, PLoS biology.

[7]  B. Charlesworth,et al.  The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. , 1997, Genetical research.

[8]  N L Kaplan,et al.  The "hitchhiking effect" revisited. , 1989, Genetics.

[9]  D. J. Funk,et al.  Selection and Genomic Differentiation During Ecological Speciation: Isolating the Contributions of Host Association via a Comparative Genome Scan of Neochlamisus bebbianae Leaf Beetles , 2008, Evolution; international journal of organic evolution.

[10]  W Stephan,et al.  The hitchhiking effect on the site frequency spectrum of DNA polymorphisms. , 1995, Genetics.

[11]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[12]  Carlos D Bustamante,et al.  Localizing Recent Adaptive Evolution in the Human Genome , 2007, PLoS genetics.

[13]  J H Gillespie,et al.  The molecular nature of allelic diversity for two models of balancing selection. , 1990, Theoretical population biology.

[14]  J. Merilä,et al.  Identifying footprints of directional and balancing selection in marine and freshwater three‐spined stickleback (Gasterosteus aculeatus) populations , 2008, Molecular ecology.

[15]  M. Stoneking,et al.  A genome scan to detect candidate regions influenced by local natural selection in human populations. , 2003, Molecular biology and evolution.

[16]  Seraina Klopfstein,et al.  The fate of mutations surfing on the wave of a range expansion. , 2006, Molecular biology and evolution.

[17]  R. Lewontin,et al.  Testing the Heterogeneity of F Values , 1975 .

[18]  Jane Rogers,et al.  Sequence differentiation in regions identified by a genome scan for local adaptation , 2008, Molecular ecology.

[19]  A. Clark,et al.  Recent and ongoing selection in the human genome , 2007, Nature Reviews Genetics.

[20]  S. Wright,et al.  Genetical Structure of Populations , 1950, Nature.

[21]  D. Balding Likelihood-based inference for genetic correlation coefficients. , 2003, Theoretical population biology.

[22]  T. Reusch,et al.  Genome scans detect consistent divergent selection among subtidal vs. intertidal populations of the marine angiosperm Zostera marina , 2007, Molecular ecology.

[23]  M Slatkin,et al.  A measure of population subdivision based on microsatellite allele frequencies. , 1995, Genetics.

[24]  A. Robertson Letters to the editors: Remarks on the Lewontin-Krakauer test. , 1975, Genetics.

[25]  M. Beaumont Adaptation and speciation: what can F(st) tell us? , 2005, Trends in ecology & evolution.

[26]  Gil McVean,et al.  The Structure of Linkage Disequilibrium Around a Selective Sweep , 2007, Genetics.

[27]  M. Nordborg Structured coalescent processes on different time scales. , 1997, Genetics.

[28]  Patrik Nosil,et al.  Heterogeneous Genomic Differentiation Between Walking-Stick Ecotypes: “Isolation by Adaptation” and Multiple Roles for Divergent Selection , 2008, Evolution; international journal of organic evolution.

[29]  Joseph K. Pickrell,et al.  Signals of recent positive selection in a worldwide sample of human populations. , 2009, Genome research.

[30]  Laurent Excoffier,et al.  splatche: a program to simulate genetic diversity taking into account environmental heterogeneity , 2004 .

[31]  Motoo Kimura,et al.  A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population*. , 1973, Genetical research.

[32]  Carlos Bustamante,et al.  Genomic scans for selective sweeps using SNP data. , 2005, Genome research.

[33]  Letters to the editors: Testing the heterogeneity of F values. , 1975, Genetics.

[34]  Arnaud Estoup,et al.  A Spatial Statistical Model for Landscape Genetics , 2005, Genetics.

[35]  Jonathan Scott Friedlaender,et al.  A Human Genome Diversity Cell Line Panel , 2002, Science.

[36]  O. Gaggiotti,et al.  A comparison of two indirect methods for estimating average levels of gene flow using microsatellite data , 1999, Molecular ecology.

[37]  Christian Schlötterer,et al.  A microsatellite-based multilocus screen for the identification of local selective sweeps. , 2002, Genetics.

[38]  M. Nei,et al.  Letters to the editors: Lewontin-Krakauer test for neutral genes. , 1975, Genetics.

[39]  M. Beaumont,et al.  Evaluating loci for use in the genetic analysis of population structure , 1996, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[40]  M. Nei,et al.  Lewontin-Krakauer test for neutral genes , 1975 .

[41]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[42]  L. Excoffier,et al.  A generic estimation of population subdivision using distances between alleles with special reference for microsatellite loci. , 1996, Genetics.

[43]  Molly Przeworski,et al.  The signature of positive selection at randomly chosen loci. , 2002, Genetics.

[44]  C. Strobeck,et al.  Average number of nucleotide differences in a sample from a single subpopulation: a test for population subdivision. , 1987, Genetics.

[45]  J. F. Storz,et al.  INVITED REVIEW: Using genome scans of DNA polymorphism to infer adaptive population divergence , 2005, Molecular ecology.

[46]  Justin C. Fay,et al.  Hitchhiking under positive Darwinian selection. , 2000, Genetics.

[47]  J. Mallet,et al.  Genomic evidence for divergence with gene flow in host races of the larch budmoth , 2004, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[48]  M. Nei Analysis of gene diversity in subdivided populations. , 1973, Proceedings of the National Academy of Sciences of the United States of America.

[49]  M. Slatkin,et al.  Genetic hitch-hiking in a subdivided population. , 1998, Genetical research.

[50]  P. Hedrick A STANDARDIZED GENETIC DIFFERENTIATION MEASURE , 2005, Evolution; international journal of organic evolution.

[51]  M. Slatkin Inbreeding coefficients and coalescence times. , 2007, Genetical research.

[52]  Kevin R. Thornton,et al.  Controlling the False-Positive Rate in Multilocus Genome Scans for Selection , 2007, Genetics.

[53]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[54]  François Pompanon,et al.  Explorative genome scan to detect candidate loci for adaptation along a gradient of altitude in the common frog (Rana temporaria). , 2006, Molecular biology and evolution.

[55]  J. Wakeley,et al.  Gene genealogies in a metapopulation. , 2001, Genetics.

[56]  L. Rieseberg,et al.  Selective Sweeps Reveal Candidate Genes for Adaptation to Drought and Salt Tolerance in Common Sunflower, Helianthus annuus , 2007, Genetics.

[57]  R. Lewontin,et al.  Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. , 1973, Genetics.

[58]  Kai Zeng,et al.  Statistical Tests for Detecting Positive Selection by Utilizing High-Frequency Variants , 2006, Genetics.

[59]  O. Gaggiotti,et al.  A Genome-Scan Method to Identify Selected Loci Appropriate for Both Dominant and Codominant Markers: A Bayesian Perspective , 2008, Genetics.

[60]  B. Weir,et al.  ESTIMATING F‐STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE , 1984, Evolution; international journal of organic evolution.

[61]  L. Excoffier,et al.  Surfing during population expansions promotes genetic revolutions and structuration. , 2008, Trends in ecology & evolution.

[62]  M. Schneider,et al.  Speciation through sensory drive in cichlid fish , 2008, Nature.

[63]  Pierre Baldi,et al.  Global landscape of recent inferred Darwinian selection for Homo sapiens , 2006, Proc. Natl. Acad. Sci. USA.

[64]  Thomas L. Turner,et al.  Locus- and population-specific selection and differentiation between incipient species of Anopheles gambiae. , 2007, Molecular biology and evolution.

[65]  L. Excoffier,et al.  Large Allele Frequency Differences between Human Continental Groups are more Likely to have Occurred by Drift During range Expansions than by Selection , 2009, Annals of human genetics.

[66]  D. Nelson,et al.  Genetic drift at expanding frontiers promotes gene segregation , 2007, Proceedings of the National Academy of Sciences.

[67]  R. Nielsen,et al.  POPULATION SIZE CHANGES RESHAPE GENOMIC PATTERNS OF DIVERSITY , 2007, Evolution; international journal of organic evolution.

[68]  M Slatkin,et al.  FST in a hierarchical island model. , 1991, Genetics.

[69]  Olivier Fedrigo,et al.  Promoter regions of many neural- and nutrition-related genes have experienced positive selection during human evolution , 2007, Nature Genetics.

[70]  Pardis C Sabeti,et al.  Detecting recent positive selection in the human genome from haplotype structure , 2002, Nature.

[71]  Molly Przeworski,et al.  How reliable are empirical genomic scans for selective sweeps? , 2006, Genome research.

[72]  L. Excoffier,et al.  A simulated annealing approach to define the genetic structure of populations , 2002, Molecular ecology.

[73]  Sohini Ramachandran,et al.  Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[74]  W. Stephan,et al.  Detecting a local signature of genetic hitchhiking along a recombining chromosome. , 2002, Genetics.

[75]  L. Bernatchez,et al.  Generic scan using AFLP markers as a means to assess the role of directional selection in the divergence of sympatric whitefish ecotypes. , 2004, Molecular biology and evolution.

[76]  L. Excoffier,et al.  Comment on "Genetic Structure of Human Populations" , 2003, Science.

[77]  Kevin R. Thornton,et al.  A New Approach for Using Genome Scans to Detect Recent Positive Selection in the Human Genome , 2007, PLoS biology.

[78]  Y. Fu,et al.  Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. , 1997, Genetics.

[79]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[80]  M. Nachman,et al.  Genome scans of DNA variability in humans reveal evidence for selective sweeps outside of Africa. , 2004, Molecular biology and evolution.

[81]  R. Hudson,et al.  Inferring the evolutionary histories of the Adh and Adh-dup loci in Drosophila melanogaster from patterns of polymorphism and divergence. , 1991, Genetics.

[82]  F. Balloux,et al.  Geography predicts neutral genetic diversity of human populations , 2005, Current Biology.

[83]  Jukka Corander,et al.  BAPS 2: enhanced possibilities for the analysis of genetic population structure , 2004, Bioinform..

[84]  Nicolas Ray,et al.  Recovering the geographic origin of early modern humans by realistic and spatially explicit simulations. , 2005, Genome research.

[85]  L. Held,et al.  Bayesian Variable Selection for Detecting Adaptive Genomic Differences Among Populations , 2008, Genetics.

[86]  L. Cavalli-Sforza Population structure and human evolution , 1966, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[87]  P. Boursot,et al.  Interpretation of variation across marker loci as evidence of selection. , 2001, Genetics.

[88]  S. Wright,et al.  Evolution in Mendelian Populations. , 1931, Genetics.

[89]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[90]  D. Balding,et al.  A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity , 2005, Genetica.

[91]  L. Excoffier,et al.  Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. , 1992, Genetics.

[92]  J. Beaulieu,et al.  Scanning the genome for gene single nucleotide polymorphisms involved in adaptive population differentiation in white spruce , 2008, Molecular ecology.

[93]  Mahesh Panchal,et al.  On the validity of nested clade phylogeographical analysis , 2008, Molecular ecology.

[94]  B Rannala,et al.  Estimating gene flow in island populations. , 1996, Genetical research.

[95]  A. Robertson Gene frequency distributions as a test of selective neutrality. , 1975, Genetics.