Sequencing era methods for identifying signatures of selection in the genome

Insights into genetic loci which are under selection and their functional roles contribute to increased understanding of the patterns of phenotypic variation we observe today. The availability of whole-genome sequence data, for humans and other species, provides opportunities to investigate adaptation and evolution at unprecedented resolution. Many analytical methods have been developed to interrogate these large data sets and characterize signatures of selection in the genome. We review here recently developed methods and consider the impact of increased computing power and data availability on the detection of selection signatures. Consideration of demography, recombination and other confounding factors is important, and use of a range of methods in combination is a powerful route to resolving different forms of selection in genome sequence data. Overall, a substantial improvement in methods for application to whole-genome sequencing is evident, although further work is required to develop robust and computationally efficient approaches which may increase reproducibility across studies.

[1]  Yuseob Kim,et al.  A Composite-Likelihood Method for Detecting Incomplete Selective Sweep from Population Genomic Data , 2015, Genetics.

[2]  Pardis C Sabeti,et al.  Genome-wide detection and characterization of positive selection in human populations , 2007, Nature.

[3]  O. Gaggiotti,et al.  Detection of selective sweeps in structured populations: a comparison of recent methods , 2016, Molecular ecology.

[4]  R. Nielsen,et al.  Detecting recent selective sweeps while controlling for mutation rate and background selection , 2015, bioRxiv.

[5]  Josep M. Comeron,et al.  Background Selection as Baseline for Nucleotide Variation across the Drosophila Genome , 2014, bioRxiv.

[6]  Ryan D. Hernandez,et al.  Classic Selective Sweeps Were Rare in Recent Human Evolution , 2011, Science.

[7]  I. Randhawa,et al.  Composite selection signals can localize the trait specific genomic regions in multi-breed populations of cattle and sheep , 2014, BMC Genetics.

[8]  D. McClish Analyzing a Portion of the ROC Curve , 1989, Medical decision making : an international journal of the Society for Medical Decision Making.

[9]  J. Pritchard,et al.  A Map of Recent Positive Selection in the Human Genome , 2006, PLoS biology.

[10]  V. Bafna,et al.  Learning Natural Selection from the Site Frequency Spectrum , 2013, Genetics.

[11]  Alexandros Stamatakis,et al.  OmegaPlus: a scalable tool for rapid detection of selective sweeps in whole-genome datasets , 2012, Bioinform..

[12]  J. Jensen On the unfounded enthusiasm for soft selective sweeps , 2014, Nature Communications.

[13]  J K Kelly,et al.  A test of neutrality based on interlocus associations. , 1997, Genetics.

[14]  Z B Zeng,et al.  Joint linkage and linkage disequilibrium mapping in natural populations. , 2001, Genetics.

[15]  Justin C. Fay,et al.  Hitchhiking under positive Darwinian selection. , 2000, Genetics.

[16]  Eric S. Lander,et al.  Identifying Recent Adaptations in Large-Scale Genomic Data , 2013, Cell.

[17]  M. Slatkin,et al.  Estimation of levels of gene flow from DNA sequence data. , 1992, Genetics.

[18]  C. V. Van Tassell,et al.  Detecting Loci under Recent Positive Selection in Dairy and Beef Cattle by Combining Different Genome-Wide Scan Methods , 2013, PloS one.

[19]  Yun S. Song,et al.  Deep Learning for Population Genetic Inference , 2015, bioRxiv.

[20]  Pardis C Sabeti,et al.  Detecting recent positive selection in the human genome from haplotype structure , 2002, Nature.

[21]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[22]  Rongling Wu,et al.  A General Framework for Statistical Linkage Analysis in Multivalent Tetraploids , 2005, Genetics.

[23]  Philipp W. Messer,et al.  Soft Selective Sweeps in Complex Demographic Scenarios , 2014, Genetics.

[24]  Michael A. Black,et al.  A bioinformatics workflow for detecting signatures of selection in genomic data , 2014, Front. Genet..

[25]  Noah A Rosenberg,et al.  Enhancing the mathematical properties of new haplotype homozygosity statistics for the detection of selective sweeps. , 2015, Theoretical population biology.

[26]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[27]  A. Carvajal-Rodríguez,et al.  Detecting the Genomic Signature of Divergent Selection in Presence of Gene Flow , 2015, Current genomics.

[28]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[29]  P. Green,et al.  Widespread Genomic Signatures of Natural Selection in Hominid Evolution , 2009, PLoS genetics.

[30]  Michael DeGiorgio,et al.  SweepFinder2: increased sensitivity, robustness and flexibility , 2015, Bioinform..

[31]  J. Pool,et al.  A haplotype method detects diverse scenarios of local adaptation from genomic sequence variation , 2016, Molecular ecology.

[32]  L. Bernatchez,et al.  The past, present and future of genomic scans for selection , 2016, Molecular ecology.

[33]  J. M. Comeron,et al.  The Many Landscapes of Recombination in Drosophila melanogaster , 2012, PLoS genetics.

[34]  Daniel R. Schrider,et al.  Effects of Linked Selective Sweeps on Demographic Inference and Model Selection , 2016, Genetics.

[35]  R. Durbin,et al.  Inference of human population history from individual whole-genome sequences. , 2011, Nature.

[36]  Matthieu Foll,et al.  Thinking too positive? Revisiting current methods of population-genetic selection inference , 2014, bioRxiv.

[37]  Masatoshi Nei,et al.  The neutral theory of molecular evolution in the genomic era. , 2010, Annual review of genomics and human genetics.

[38]  R. Wu,et al.  A unifying experimental design for dissecting tree genomes. , 2015, Trends in plant science.

[39]  Philipp W. Messer,et al.  Recent Selective Sweeps in North American Drosophila melanogaster Show Signatures of Soft Sweeps , 2013, PLoS genetics.

[40]  R. Nielsen,et al.  Linkage Disequilibrium as a Signature of Selective Sweeps , 2004, Genetics.

[41]  Mark George Thomas,et al.  World-wide distributions of lactase persistence alleles and the complex effects of recombination and selection , 2017, Human Genetics.

[42]  Christian Schlötterer,et al.  Distinguishing Positive Selection From Neutral Evolution: Boosting the Performance of Summary Statistics , 2011, Genetics.

[43]  T. Kivisild,et al.  Refining the Use of Linkage Disequilibrium as a Robust Signature of Selective Sweeps , 2016, Genetics.

[44]  G. Lettre Recent progress in the study of the genetics of height , 2011, Human Genetics.

[45]  Matthew W. Hahn,et al.  Soft Shoulders Ahead: Spurious Signatures of Soft and Partial Selective Sweeps Result from Linked Hard Sweeps , 2015, Genetics.

[46]  C. Bustamante,et al.  Distinguishing Between Selective Sweeps and Demography Using DNA Polymorphism Data , 2005, Genetics.

[47]  Daniel R. Schrider,et al.  Soft Sweeps Are the Dominant Mode of Adaptation in the Human Genome , 2016, bioRxiv.

[48]  Graham R Serjeant,et al.  Sickle-cell disease , 1984, The Lancet.

[49]  G. Coop,et al.  A Population Genetic Signal of Polygenic Adaptation , 2013, PLoS genetics.

[50]  P. O’Reilly,et al.  Confounding between recombination and selection, and the Ped/Pop method for detecting selection. , 2008, Genome research.

[51]  J. Jensen,et al.  Evaluating the ability of the pairwise joint site frequency spectrum to co-estimate selection and demography , 2015, Front. Genet..

[52]  Xiangdong Ding,et al.  Properties of different selection signature statistics and a new strategy for combining them , 2015, Heredity.

[53]  I. Randhawa,et al.  Composite Selection Signals for Complex Traits Exemplified Through Bovine Stature Using Multibreed Cohorts of European and African Bos taurus , 2015, G3: Genes, Genomes, Genetics.

[54]  Cameron D. Palmer,et al.  Evidence of widespread selection on standing variation in Europe at height-associated SNPs , 2012, Nature Genetics.

[55]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[56]  O. Gaggiotti,et al.  A Genome-Scan Method to Identify Selected Loci Appropriate for Both Dominant and Codominant Markers: A Bayesian Perspective , 2008, Genetics.

[57]  Maite G. Barrón,et al.  Multiple Independent Retroelement Insertions in the Promoter of a Stress Response Gene Have Variable Molecular and Functional Effects in Drosophila , 2016, PLoS genetics.

[58]  M. Goddard,et al.  Selection for complex traits leaves little or no classic signatures of selection , 2014, BMC Genomics.

[59]  Pavlos Pavlidis,et al.  A survey of methods and tools to detect recent and strong positive selection , 2017, Journal of Biological Research-Thessaloniki.

[60]  Carlos Bustamante,et al.  Genomic scans for selective sweeps using SNP data. , 2005, Genome research.

[61]  P. Donnelly,et al.  Comparison of Fine-Scale Recombination Rates in Humans and Chimpanzees , 2005, Science.

[62]  W. Stephan,et al.  A critical assessment of storytelling: gene ontology categories and the importance of validating genomic scans. , 2012, Molecular biology and evolution.

[63]  Kyle J. Gaulton,et al.  Detection of human adaptation during the past 2000 years , 2016, Science.

[64]  J. Akey,et al.  Selection plays the hand it was dealt: evidence that human adaptation commonly targets standing genetic variation , 2017, Genome Biology.

[65]  M. Fumagalli,et al.  Human genome variability, natural selection and infectious diseases. , 2014, Current opinion in immunology.

[66]  Joachim Hermisson,et al.  Soft Sweeps III: The Signature of Positive Selection from Recurrent Mutation , 2006, PLoS genetics.

[67]  Gregory Ewing,et al.  MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus , 2010, Bioinform..

[68]  Nikolaos S. Alachiotis,et al.  Scalable linkage-disequilibrium-based selective sweep detection: a performance guide , 2016, GigaScience.

[69]  R. Nielsen,et al.  On Detecting Incomplete Soft or Hard Selective Sweeps Using Haplotype Structure , 2014, Molecular biology and evolution.

[70]  J. Hermisson,et al.  Soft sweeps and beyond: Understanding the patterns and probabilities of selection footprints under rapid adaptation , 2017, bioRxiv.

[71]  John Maynard Smith,et al.  The hitch-hiking effect of a favourable gene. , 1974, Genetical research.

[72]  P. Donnelly,et al.  A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome , 2005, Science.

[73]  Alexandros Stamatakis,et al.  Result verification, code verification and computation of support values in phylogenetics , 2011, Briefings Bioinform..

[74]  R. Hudson,et al.  A test of neutral molecular evolution based on nucleotide data. , 1987, Genetics.

[75]  David Comas,et al.  Independent introduction of two lactase-persistence alleles into human populations reflects different history of adaptation to milk culture. , 2008, American journal of human genetics.

[76]  R. Nielsen Molecular signatures of natural selection. , 2005, Annual review of genetics.

[77]  Holly M. Mortensen,et al.  Convergent adaptation of human lactase persistence in Africa and Europe , 2007, Nature Genetics.

[78]  Sharon R Grossman,et al.  Detecting natural selection in genomic data. , 2013, Annual review of genetics.

[79]  Yancy Lo,et al.  Going global by adapting local: A review of recent human adaptation , 2016, Science.

[80]  W. Provine Ernst Mayr: Genetics and speciation. , 2004, Genetics.

[81]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[82]  Giovanni Marco Dall'Olio,et al.  Hierarchical boosting: a machine-learning framework to detect and classify hard selective sweeps in human populations , 2015, Bioinform..

[83]  R. Tearle,et al.  Whole genome sequences are required to fully resolve the linkage disequilibrium structure of human populations , 2015, BMC Genomics.

[84]  Andrew G Clark,et al.  Aberrant Time to Most Recent Common Ancestor as a Signature of Natural Selection. , 2015, Molecular biology and evolution.

[85]  Jeffrey D. Jensen,et al.  The impact of equilibrium assumptions on tests of selection , 2013, Front. Genet..

[86]  Ryan J. Haasl,et al.  Fifteen years of genomewide scans for selection: trends, lessons and unaddressed genetic sources of complication , 2016, Molecular ecology.

[87]  A. Gylfason,et al.  Fine-scale recombination rate differences between sexes, populations and individuals , 2010, Nature.

[88]  Andrew D. Kern,et al.  S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning , 2015, bioRxiv.

[89]  Stephen R Quake,et al.  Whole-genome molecular haplotyping of single cells , 2011, Nature Biotechnology.

[90]  M. Fagny,et al.  Exploring the occurrence of classic selective sweeps in humans using whole-genome sequencing data sets. , 2014, Molecular biology and evolution.

[91]  Antonio Carvajal-Rodríguez,et al.  HacDivSel: Two new methods (haplotype-based and outlier-based) for the detection of divergent selection in pairs of populations , 2016, bioRxiv.

[92]  Josefa González,et al.  Beyond SNPs: how to detect selection on transposable element insertions , 2017 .

[93]  W. Stephan,et al.  Modes of Rapid Polygenic Adaptation. , 2017, Molecular biology and evolution.