Discovery of Ongoing Selective Sweeps within Anopheles Mosquito Populations Using Deep Learning

Identification of partial sweeps, which include both hard and soft sweeps that have not currently reached fixation, provides crucial information about ongoing evolutionary responses. To this end, we introduce partialS/HIC, a deep learning method to discover selective sweeps from population genomic data. partialS/HIC uses a convolutional neural network for image processing, which is trained with a large suite of summary statistics derived from coalescent simulations incorporating population-specific history, to distinguish between completed versus partial sweeps, hard versus soft sweeps, and regions directly affected by selection versus those merely linked to nearby selective sweeps. We perform several simulation experiments under various demographic scenarios to demonstrate partialS/HIC’s performance, which exhibits excellent resolution for detecting partial sweeps. We also apply our classifier to whole genomes from eight mosquito populations sampled across sub-Saharan Africa by the Anopheles gambiae 1000 Genomes Consortium, elucidating both continent-wide patterns as well as sweeps unique to specific geographic regions. These populations have experienced intense insecticide exposure over the past two decades, and we observe a strong overrepresentation of sweeps at insecticide resistance loci. Our analysis thus provides a list of candidate adaptive loci that may be relevant to mosquito control efforts. More broadly, our supervised machine learning approach introduces a method to distinguish between completed and partial sweeps, as well as between hard and soft sweeps, under a variety of demographic scenarios. As whole-genome data rapidly accumulate for a greater diversity of organisms, partialS/HIC addresses an increasing demand for useful selection scan tools that can track in-progress evolutionary dynamics.

[1]  Pardis C Sabeti,et al.  Detecting recent positive selection in the human genome from haplotype structure , 2002, Nature.

[2]  Philipp W. Messer,et al.  Recent Selective Sweeps in North American Drosophila melanogaster Show Signatures of Soft Sweeps , 2013, PLoS genetics.

[3]  R. Nielsen,et al.  Linkage Disequilibrium as a Signature of Selective Sweeps , 2004, Genetics.

[4]  Peter L. Ralph,et al.  Predicting geographic location from genetic variation with deep neural networks , 2019, bioRxiv.

[5]  Christian Schlötterer,et al.  Distinguishing Positive Selection From Neutral Evolution: Boosting the Performance of Summary Statistics , 2011, Genetics.

[6]  Alistair Miles,et al.  Genetic diversity of the African malaria vector Anopheles gambiae , 2017, Nature.

[7]  Leonid Kruglyak,et al.  Chromosome-scale selective sweeps shape Caenorhabditis elegans genomic diversity , 2011, Nature Genetics.

[8]  Vineet Bafna,et al.  Identifying the Favored Mutation in a Positive Selective Sweep , 2018, Nature Methods.

[9]  Yun-Xin Fu,et al.  Exploring Population Size Changes Using SNP Frequency Spectra , 2015, Nature Genetics.

[10]  A. Devonshire,et al.  Molecular characterization of pyrethroid knockdown resistance (kdr) in the major malaria vector Anopheles gambiae s.s. , 1998, Insect molecular biology.

[11]  J. Pritchard,et al.  A Map of Recent Positive Selection in the Human Genome , 2006, PLoS biology.

[12]  Yun S. Song,et al.  A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks , 2018, bioRxiv.

[13]  Andrew D. Kern,et al.  S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning , 2015, bioRxiv.

[14]  Justin C. Fay,et al.  Hitchhiking under positive Darwinian selection. , 2000, Genetics.

[15]  Peter L. Ralph,et al.  Patterns of Neutral Diversity Under General Models of Selective Sweeps , 2011, Genetics.

[16]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[17]  V. Bafna,et al.  Learning Natural Selection from the Site Frequency Spectrum , 2013, Genetics.

[18]  J. M. Smith,et al.  The hitch-hiking effect of a favourable gene. , 1974, Genetical research.

[19]  Haipeng Li,et al.  New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era , 2016, G3: Genes, Genomes, Genetics.

[20]  W. Stephan,et al.  Searching for Footprints of Positive Selection in Whole-Genome SNP Data From Nonequilibrium Populations , 2010, Genetics.

[21]  G. A. Watterson On the number of segregating sites in genetical models without recombination. , 1975, Theoretical population biology.

[22]  r. Mosq,et al.  INSECTICIDE RESISTANCE IN MOSQUITOES : A PRAGMATIC REVIEW , 2004 .

[23]  J K Kelly,et al.  A test of neutrality based on interlocus associations. , 1997, Genetics.

[24]  Peter L. Ralph,et al.  Parallel Adaptation: One or Many Waves of Advance of an Advantageous Allele? , 2010, Genetics.

[25]  Giovanni Marco Dall'Olio,et al.  Hierarchical boosting: a machine-learning framework to detect and classify hard selective sweeps in human populations , 2015, Bioinform..

[26]  Daniel R. Schrider,et al.  Accurate inference of tree topologies from multiple sequence alignments using deep learning. , 2019, Systematic biology.

[27]  J. Anderson,et al.  Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars , 2013, Proceedings of the National Academy of Sciences.

[28]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[29]  M. Rowland,et al.  The activity of the pyrrole insecticide chlorfenapyr in mosquito bioassay: towards a more rational testing and screening of non-neurotoxic insecticides for malaria vector control , 2015, Malaria Journal.

[30]  A. Clark,et al.  ANTAGONISTIC VERSUS NONANTAGONISTIC MODELS OF BALANCING SELECTION: CHARACTERIZING THE RELATIVE TIMESCALES AND HITCHHIKING EFFECTS OF PARTIAL SELECTIVE SWEEPS , 2013, Evolution; international journal of organic evolution.

[31]  A. Futschik,et al.  A Fast Estimate for the Population Recombination Rate Based on Regression , 2013, Genetics.

[32]  Daniel R. Schrider,et al.  Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia , 2017, bioRxiv.

[33]  May Ho,et al.  Plasmodium falciparum genome-wide scans for positive selection, recombination hot spots and resistance to antimalarial drugs , 2010, Nature Genetics.

[34]  N. Élissa,et al.  Resistance of Anopheles gambiae s.s. to pyrethroids in Côte d'Ivoire. , 1993, Annales de la Societe belge de medecine tropicale.

[35]  Andrew D. Kern,et al.  discoal: flexible coalescent simulations with selection , 2016, bioRxiv.

[36]  P. Keightley,et al.  Detecting positive selection in the genome , 2017, BMC Biology.

[37]  J. M. Comeron,et al.  Predictive Models of Recombination Rate Variation across the Drosophila melanogaster Genome , 2016, Genome biology and evolution.

[38]  Thomas Wiehe,et al.  The Effect of Strongly Selected Substitutions on Neutral Polymorphism: Analytical Results Based on Diffusion Theory , 1992 .

[39]  Daniel R. Schrider,et al.  diploS/HIC: An Updated Approach to Classifying Selective Sweeps , 2018, G3: Genes, Genomes, Genetics.

[40]  Or Zuk,et al.  A Composite of Multiple Signals Distinguishes Causal Variants in Regions of Positive Selection , 2010, Science.

[41]  Sohini Ramachandran,et al.  Localization of adaptive variants in human genomes using averaged one-dependence estimation , 2017, Nature Communications.

[42]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  A. Kern,et al.  Genomic effects of nucleotide substitutions in Drosophila simulans. , 2002, Genetics.

[45]  Joseph K. Pickrell,et al.  The Genetics of Human Adaptation: Hard Sweeps, Soft Sweeps, and Polygenic Adaptation , 2010, Current Biology.

[46]  Philipp W. Messer,et al.  Evolution of Resistance Against CRISPR/Cas9 Gene Drive , 2016, Genetics.

[47]  N L Kaplan,et al.  The "hitchhiking effect" revisited. , 1989, Genetics.

[48]  Daniel R. Schrider,et al.  Soft Sweeps Are the Dominant Mode of Adaptation in the Human Genome , 2016, bioRxiv.

[49]  J. Hemingway,et al.  Averting a malaria disaster: will insecticide resistance derail malaria control? , 2016, The Lancet.

[50]  Carlos Bustamante,et al.  Genomic scans for selective sweeps using SNP data. , 2005, Genome research.

[51]  Alexander T. Xue,et al.  multi‐dice: r package for comparative population genomic inference under hierarchical co‐demographic models of independent single‐population size changes , 2017, Molecular ecology resources.

[52]  Jack Sullivan,et al.  Demographic model selection using random forests and the site frequency spectrum , 2017, Molecular ecology.

[53]  Jeffrey R. Adrion,et al.  Predicting the Landscape of Recombination Using Deep Learning , 2020, Molecular biology and evolution.

[54]  Anton Suvorov,et al.  Accurate inference of tree topologies from multiple sequence alignments using deep learning , 2019, bioRxiv.

[55]  Philipp W. Messer,et al.  Heterozygote advantage as a natural consequence of adaptation in diploids , 2011, Proceedings of the National Academy of Sciences.

[56]  A. Clark,et al.  Evolution of Resistance Against CRISPR/Cas9 Gene Drive , 2016, Genetics.

[57]  Yun S. Song,et al.  Deep Learning for Population Genetic Inference , 2015, bioRxiv.

[58]  Daniel R. Schrider,et al.  Supervised Machine Learning for Population Genetics: A New Paradigm , 2018, Trends in genetics : TIG.

[59]  Andrea Crisanti,et al.  A CRISPR-Cas9 Gene Drive System Targeting Female Reproduction in the Malaria Mosquito vector Anopheles gambiae , 2015, Nature Biotechnology.

[60]  Jean-Marie Cornuet,et al.  ABC model choice via random forests , 2014, 1406.6288.

[61]  Daniel R. Schrider,et al.  The Unreasonable Effectiveness of Convolutional Neural Networks in Population Genetic Inference , 2018 .

[62]  Philipp W. Messer,et al.  Modeling the Manipulation of Natural Populations by the Mutagenic Chain Reaction , 2015, Genetics.