The search for loci under selection: trends, biases and progress

Detecting genetic variants under selection using FST outlier analysis (OA) and environmental association analyses (EAAs) are popular approaches that provide insight into the genetic basis of local adaptation. Despite the frequent use of OA and EAA approaches and their increasing attractiveness for detecting signatures of selection, their application to field‐based empirical data have not been synthesized. Here, we review 66 empirical studies that use Single Nucleotide Polymorphisms (SNPs) in OA and EAA. We report trends and biases across biological systems, sequencing methods, approaches, parameters, environmental variables and their influence on detecting signatures of selection. We found striking variability in both the use and reporting of environmental data and statistical parameters. For example, linkage disequilibrium among SNPs and numbers of unique SNP associations identified with EAA were rarely reported. The proportion of putatively adaptive SNPs detected varied widely among studies, and decreased with the number of SNPs analysed. We found that genomic sampling effort had a greater impact than biological sampling effort on the proportion of identified SNPs under selection. OA identified a higher proportion of outliers when more individuals were sampled, but this was not the case for EAA. To facilitate repeatability, interpretation and synthesis of studies detecting selection, we recommend that future studies consistently report geographical coordinates, environmental data, model parameters, linkage disequilibrium, and measures of genetic structure. Identifying standards for how OA and EAA studies are designed and reported will aid future transparency and comparability of SNP‐based selection studies and help to progress landscape and evolutionary genomics.

[1]  R. J. Dyer,et al.  Putting the landscape into the genomics of trees: approaches for understanding local adaptation and population responses to changing climate , 2013, Tree Genetics & Genomes.

[2]  Alex A. Pollen,et al.  The genomic basis of adaptive evolution in threespine sticklebacks , 2012, Nature.

[3]  Matthew W. Hahn,et al.  Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow , 2014, Molecular ecology.

[4]  Hilla Peretz,et al.  Ju n 20 03 Schrödinger ’ s Cat : The rules of engagement , 2003 .

[5]  M. Spitz,et al.  Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. , 2008, American journal of human genetics.

[6]  R. Hudson,et al.  Maximum-Likelihood Estimation of Demographic Parameters Using the Frequency Spectrum of Unlinked Single-Nucleotide Polymorphisms , 2004, Genetics.

[7]  Noah A. Rosenberg,et al.  The Relationship Between FST and the Frequency of the Most Frequent Allele , 2013, Genetics.

[8]  R. Moyle,et al.  Isolation by environment in White‐breasted Nuthatches (Sitta carolinensis) of the Madrean Archipelago sky islands: a landscape genomics approach , 2015, Molecular ecology.

[9]  Jared L. Strasburg,et al.  How robust are "isolation with migration" analyses to violations of the im model? A simulation study. , 2010, Molecular biology and evolution.

[10]  M. Fortin,et al.  Perspectives on the use of landscape genetics to detect genetic adaptive variation in the field , 2010, Molecular ecology.

[11]  É. Frichot,et al.  Detecting adaptive evolution based on association with ecological gradients: Orientation matters! , 2015, Heredity.

[12]  David Levine,et al.  A high-performance computing toolset for relatedness and principal component analysis of SNP data , 2012, Bioinform..

[13]  K. Gunderson,et al.  A genome-wide scalable SNP genotyping assay using microarray technology , 2005, Nature Genetics.

[14]  H. Hoekstra Genetics, development and evolution of adaptive pigmentation in vertebrates , 2006, Heredity.

[15]  Robert J. Elshire,et al.  A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species , 2011, PloS one.

[16]  G. McVean,et al.  Differential confounding of rare and common variants in spatially structured populations , 2011, Nature Genetics.

[17]  A. Futschik,et al.  The Next Generation of Molecular Markers From Massively Parallel Sequencing of Pooled DNA Samples , 2010, Genetics.

[18]  S. Narum,et al.  Comparison of FST outlier tests for SNP loci under selection , 2011, Molecular ecology resources.

[19]  T. Günther,et al.  Genomic and phenotypic differentiation of Arabidopsis thaliana along altitudinal gradients in the North Italian Alps , 2016, Molecular ecology.

[20]  Nathaniel D. Chu,et al.  Phylogenomic analyses reveal latitudinal population structure and polymorphisms in heat stress genes in the North Atlantic snail Nucella lapillus , 2014, Molecular ecology.

[21]  Gideon S. Bradburd,et al.  Finding the Genomic Basis of Local Adaptation: Pitfalls, Practical Solutions, and Future Directions , 2016, The American Naturalist.

[22]  Jason G. Bragg,et al.  Genomic variation across landscapes: insights and applications. , 2015, The New phytologist.

[23]  Joanna L. Kelley,et al.  Breaking RAD: an evaluation of the utility of restriction site‐associated DNA sequencing for genome scans of adaptation , 2016, Molecular ecology resources.

[24]  M. Cargill Characterization of single-nucleotide polymorphisms in coding regions of human genes , 1999, Nature Genetics.

[25]  O. Rajora,et al.  Single-Locus versus Multilocus Patterns of Local Adaptation to Climate in Eastern White Pine (Pinus strobus, Pinaceae) , 2016, PloS one.

[26]  R. Sommer,et al.  Environmental Variables Explain Genetic Structure in a Beetle-Associated Nematode , 2014, PloS one.

[27]  Josephine R. Paris,et al.  Lost in parameter space: a road map for stacks , 2017 .

[28]  Martin I. Taylor,et al.  Environmental selection on transcriptome‐derived SNPs in a high gene flow marine fish, the Atlantic herring (Clupea harengus) , 2012, Molecular ecology.

[29]  R. Lewontin,et al.  THE EVOLUTIONARY DYNAMICS OF COMPLEX POLYMORPHISMS , , , 1960 .

[30]  D. Urban,et al.  Using genotype-environment associations to identify multilocus local adaptation , 2017 .

[31]  Shinichi Nakagawa,et al.  A general and simple method for obtaining R2 from generalized linear mixed‐effects models , 2013 .

[32]  M. Beaumont,et al.  Evaluating loci for use in the genetic analysis of population structure , 1996, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[33]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[34]  T. Vision,et al.  The molecular ecologist's guide to expressed sequence tags , 2006, Molecular ecology.

[35]  M. Nordborg,et al.  A Map of Local Adaptation in Arabidopsis thaliana , 2011, Science.

[36]  W. Atchley,et al.  Statistical Properties of Ratios. I. Empirical Results , 1976 .

[37]  L. Waits,et al.  Landscape genetics: where are we now? , 2010, Molecular ecology.

[38]  W. Stephan,et al.  A critical assessment of storytelling: gene ontology categories and the importance of validating genomic scans. , 2012, Molecular biology and evolution.

[39]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[40]  R. Waples,et al.  Estimating contemporary effective population size in non-model species using linkage disequilibrium across thousands of loci , 2016, Heredity.

[41]  David B. Witonsky,et al.  Using Environmental Correlations to Identify Loci Underlying Local Adaptation , 2010, Genetics.

[42]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[43]  Felix Gugerli,et al.  A practical guide to environmental association analysis in landscape genomics , 2015, Molecular ecology.

[44]  Edward S. Buckler,et al.  TASSEL: software for association mapping of complex traits in diverse samples , 2007, Bioinform..

[45]  Montgomery Slatkin,et al.  Linkage disequilibrium — understanding the evolutionary past and mapping the medical future , 2008, Nature Reviews Genetics.

[46]  J. Stinchcombe,et al.  What can genome-wide association studies tell us about the evolutionary forces maintaining genetic variation for quantitative traits? , 2017, The New phytologist.

[47]  J. Johnston,et al.  Signatures of selection in the Iberian honey bee (Apis mellifera iberiensis) revealed by a genome scan analysis of single nucleotide polymorphisms , 2013, Molecular ecology.

[48]  C. Jiggins,et al.  Towards the identification of the loci of adaptive evolution , 2015, Methods in ecology and evolution.

[49]  C. Sayer,et al.  Comparing RADseq and microsatellites to infer complex phylogeographic patterns, an empirical perspective in the Crucian carp, Carassius carassius, L. , 2016, Molecular ecology.

[50]  L. Bernatchez,et al.  Unbroken: RADseq remains a powerful tool for understanding the genetics of adaptation in natural populations , 2017, Molecular ecology resources.

[51]  M. Whitlock,et al.  Convergent local adaptation to climate in distantly related conifers , 2016, Science.

[52]  Seraina Klopfstein,et al.  The fate of mutations surfing on the wave of a range expansion. , 2006, Molecular biology and evolution.

[53]  M. Jakobsson,et al.  Joint analysis of demography and selection in population genetics: where do we stand and where could we go? , 2012, Molecular ecology.

[54]  E. Buckler,et al.  Structure of linkage disequilibrium in plants. , 2003, Annual review of plant biology.

[55]  L. Seeb,et al.  RADseq provides unprecedented insights into molecular ecology and evolutionary genetics: comment on Breaking RAD by Lowry et al. (2016) , 2017, Molecular ecology resources.

[56]  C. Dreyer,et al.  Estimates of Genetic Differentiation Measured by FST Do Not Necessarily Require Large Sample Sizes When Using Many SNP Markers , 2012, PloS one.

[57]  Jeremy S. Johnson,et al.  Landscape Genomics: Understanding Relationships Between Environmental Heterogeneity and Genomic Characteristics of Populations , 2017 .

[58]  Y. Benjamini,et al.  Quantitative Trait Loci Analysis Using the False Discovery Rate , 2005, Genetics.

[59]  Peter Holmans,et al.  Effects of Differential Genotyping Error Rate on the Type I Error Probability of Case-Control Studies , 2006, Human Heredity.

[60]  S. Manel,et al.  Genomic resources and their influence on the detection of the signal of positive selection in genome scans , 2016, Molecular ecology.

[61]  M. Whitlock,et al.  Reliable Detection of Loci Responsible for Local Adaptation: Inference of a Null Model through Trimming the Distribution of FST* , 2015, The American Naturalist.

[62]  Molly Przeworski,et al.  Learning about Modes of Speciation by Computational Approaches , 2009, Evolution; international journal of organic evolution.

[63]  Patterns of neutral and adaptive genetic diversity across the natural range of sugar pine (Pinus lambertiana Dougl.) , 2016, Tree Genetics & Genomes.

[64]  V. Friesen,et al.  Genomics of local adaptation with gene flow , 2016, Molecular ecology.

[65]  M. Noor,et al.  Islands of speciation or mirages in the desert? Examining the role of restricted recombination in maintaining species , 2010, Heredity.

[66]  M. Whitlock,et al.  Evaluation of demographic history and neutral parameterization on the performance of FST outlier tests , 2014, Molecular ecology.

[67]  Yoshio Tateno,et al.  Accuracy of estimated phylogenetic trees from molecular data , 2005, Journal of Molecular Evolution.

[68]  H. Hoekstra,et al.  Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species , 2012, PloS one.

[69]  Gordon Luikart,et al.  LOSITAN: A workbench to detect molecular adaptation based on a Fst-outlier method , 2008, BMC Bioinformatics.

[70]  Lisa J. Martin,et al.  The effect of minor allele frequency on the likelihood of obtaining false positives , 2009, BMC Proceedings.

[71]  P Taberlet,et al.  A spatial analysis method (SAM) to detect candidate loci for selection: towards a landscape genomics approach to adaptation , 2007, Molecular ecology.

[72]  Pierre Taberlet,et al.  Landscape genetics: combining landscape ecology and population genetics , 2003 .

[73]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[74]  F. Pompanon,et al.  Optimizing the trade‐off between spatial and genetic sampling efforts in patchy populations: towards a better assessment of functional connectivity using an individual‐based sampling scheme , 2013, Molecular ecology.

[75]  F. Brescia,et al.  Correlation between shell phenotype and local environment suggests a role for natural selection in the evolution of Placostylus snails , 2015, Molecular ecology.

[76]  Bernard C. Kenney,et al.  Beware of spurious self‐correlations! , 1982 .

[77]  K. J. Willis,et al.  The ability of climate envelope models to predict the effect of climate change on species distributions , 2007 .

[78]  Marie-Josée Fortin,et al.  Effects of sample size, number of markers, and allelic richness on the detection of spatial genetic pattern , 2012 .

[79]  S. Cushman,et al.  Spurious correlations and inference in landscape genetics , 2010, Molecular ecology.

[80]  O. Gaggiotti,et al.  A Genome-Scan Method to Identify Selected Loci Appropriate for Both Dominant and Codominant Markers: A Bayesian Perspective , 2008, Genetics.

[81]  F. Gugerli,et al.  Estimating genomic diversity and population differentiation – an empirical comparison of microsatellite and SNP variation in Arabidopsis halleri , 2017, BMC Genomics.

[82]  J. Atema,et al.  Sex matters in Massive Parallel Sequencing: Evidence for biases in genetic parameter estimation and investigation of sex determination systems , 2016, bioRxiv.

[83]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: dominant markers and null alleles , 2007, Molecular ecology notes.

[84]  A. Amores,et al.  Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. , 2007, Genome research.

[85]  Omar De la Cruz,et al.  Population structure at different minor allele frequency levels , 2014, BMC Proceedings.

[86]  Y. Vigouroux,et al.  Genome scan reveals selection acting on genes linked to stress response in wild pearl millet , 2016, Molecular ecology.

[87]  Karsten M. Borgwardt,et al.  Genomic Profiles of Diversification and Genotype-Phenotype Association in Island Nematode Lineages. , 2016, Molecular biology and evolution.

[88]  T. Reusch,et al.  Molecular ecology of global change , 2007, Molecular ecology.

[89]  Nourollah Ahmadi,et al.  Detecting selection along environmental gradients: analysis of eight methods and their effectiveness for outbreeding and selfing populations , 2013, Molecular ecology.

[90]  G. Coop,et al.  Robust Identification of Local Adaptation from Allele Frequencies , 2012, Genetics.

[91]  L. Excoffier,et al.  Detecting loci under selection in a hierarchically structured population , 2009, Heredity.

[92]  M. Whitlock,et al.  The relative power of genome scans to detect local adaptation depends on sampling design and statistical method , 2015, Molecular ecology.

[93]  Laurent Excoffier,et al.  Arlequin (version 3.0): An integrated software package for population genetics data analysis , 2005, Evolutionary bioinformatics online.