On the well-founded enthusiasm for soft sweeps in humans: a reply to Harris, Sackman, and Jensen

A challenging but central question in population genetics is the detection of genomic regions underpinning recent adaptation. To this end, we recently devised a machine learning method, termed S/HIC, which detects both “hard” selective sweeps on de novo mutations and “soft” sweeps on standing genetic variation with high sensitivity and specificity, while being exceptionally robust to demographic model misspecification. We previously applied S/HIC to human population genomic data and uncovered evidence of a large number of recent selective sweeps across the genome, most of which we classified as soft sweeps. A critique of recent efforts to detect soft sweeps, including our own, has made the argument that S/HIC is in fact so vulnerable to demographic misspecification that our analyses with it should be completely discounted. Through a careful consideration of the claims of this critique, we argue that the impact of such misspecification on our analysis in humans is minimal with respect to our conclusions. The critique in question also argued that our false discovery rate in humans was essentially 100%; however we show that this inaccurate claim is due to a regrettable error on the part of its authors. We argue that our scan for selection has produced several interesting observations on recent adaptation in humans that are highly concordant with independent efforts to detect signatures of more ancient positive selection. We conclude that the evidence for the utility of S/HIC, and the validity of our application of it to human data, is highly compelling, and that strictly demographic explanations for our results are clearly untenable.

[1]  Daniel R. Schrider,et al.  Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia , 2017, bioRxiv.

[2]  Daniel R. Schrider,et al.  Soft Sweeps Are the Dominant Mode of Adaptation in the Human Genome , 2016, bioRxiv.

[3]  D. Petrov,et al.  Viruses are a dominant driver of protein adaptation in mammals , 2015, bioRxiv.

[4]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[5]  Andrew D. Kern,et al.  S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning , 2015, bioRxiv.

[6]  Kun Tang,et al.  Recent Coselection in Human Populations Revealed by Protein–Protein Interaction Network , 2014, Genome biology and evolution.

[7]  J. Jensen,et al.  On the unfounded enthusiasm for soft selective sweeps II: Examining recent evidence from humans, flies, and viruses , 2018, bioRxiv.

[8]  R. Nielsen,et al.  Distinguishing between Selective Sweeps from Standing Variation and from a De Novo Mutation , 2012, PLoS genetics.

[9]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[10]  R. Nielsen,et al.  Patterns of Positive Selection in Six Mammalian Genomes , 2008, PLoS genetics.

[11]  C. Bustamante,et al.  Distinguishing Between Selective Sweeps and Demography Using DNA Polymorphism Data , 2005, Genetics.

[12]  Timothy B Sackton,et al.  A Scan for Positively Selected Genes in the Genomes of Humans and Chimpanzees , 2005, PLoS biology.

[13]  Christine M. Malcom,et al.  Accelerated Evolution of Nervous System Genes in the Origin of Homo sapiens , 2004, Cell.

[14]  G. Churchill,et al.  Properties of statistical tests of neutrality for DNA polymorphism data. , 1995, Genetics.