Inferring the landscape of recombination using recurrent neural networks

Accurately inferring the genome-wide landscape of recombination rates in natural populations is a central aim in genomics, as patterns of linkage influence everything from genetic mapping to understanding evolutionary history. Here we describe ReLERNN, a deep learning method for accurately estimating a genome-wide recombination landscape using as few as four samples. Rather than use summaries of linkage disequilibrium as its input, ReLERNN considers columns from a genotype alignment, which are then modeled as a sequence across the genome using a recurrent neural network. We demonstrate that ReLERNN improves accuracy and reduces bias relative to existing methods and maintains high accuracy in the face of demographic model misspecification. We apply ReLERNN to natural populations of African Drosophila melanogaster and show that genome-wide recombination landscapes, while largely correlated among populations, exhibit important population-specific differences. Lastly, we connect the inferred patterns of recombination with the frequencies of major inversions segregating in natural Drosophila populations.

[1]  A. Sturtevant,et al.  A Case of Rearrangement of Genes in Drosophila. , 1921, Proceedings of the National Academy of Sciences of the United States of America.

[2]  R. Punnett,et al.  The Genetical Theory of Natural Selection , 1930, Nature.

[3]  H. Muller Some Genetic Aspects of Sex , 1932, The American Naturalist.

[4]  T. Dobzhansky Genetics and the Origin of Species , 1937 .

[5]  T. Dobzhansky,et al.  The Suppression of Crossing Over in Inversion Heterozygotes of Drosophila Pseudoobscura. , 1948, Proceedings of the National Academy of Sciences of the United States of America.

[6]  J. Schultz,et al.  Interchromosomal effects on crossing over in Drosophila. , 1951, Cold Spring Harbor symposia on quantitative biology.

[7]  E. Novitski,et al.  An Analysis of Crossing over within a Heterozygous Inversion in Drosophila Melanogaster. , 1954, Genetics.

[8]  M. White,et al.  Animal cytology and evolution. , 1955 .

[9]  R. Lewontin,et al.  THE EVOLUTIONARY DYNAMICS OF COMPLEX POLYMORPHISMS , , , 1960 .

[10]  W. G. Hill,et al.  The effect of linkage on limits to artificial selection. , 1966, Genetical research.

[11]  D. Suzuki,et al.  The Interchromosomal Control of Recombination , 1968 .

[12]  T. Ohta,et al.  Linkage disequilibrium due to random genetic drift , 1969 .

[13]  T. Ohta,et al.  Development of associative overdominance through linkage disequilibrium in finite populations. , 1970, Genetical research.

[14]  R. Nicklas Chromosome segregation mechanisms. , 1974, Genetics.

[15]  J. M. Smith,et al.  The hitch-hiking effect of a favourable gene. , 1974, Genetical research.

[16]  B. Charlesworth Recombination modification in a fluctuating environment , 1976, Advances in Applied Probability.

[17]  R. Hudson,et al.  Statistical properties of the number of recombination events in the history of a sample of DNA sequences. , 1985, Genetics.

[18]  R. Hudson,et al.  Estimating the recombination parameter of a finite population model without selection. , 1987, Genetical research.

[19]  G Harauz,et al.  Meiotic gene conversion tract length distribution within the rosy locus of Drosophila melanogaster. , 1994, Genetics.

[20]  M. Slatkin Linkage disequilibrium in growing and stable populations. , 1994, Genetics.

[21]  N. Barton,et al.  A general model for the evolution of recombination. , 1995, Genetical research.

[22]  J. Wakeley Using the variance of pairwise differences to estimate the recombination rate. , 1997, Genetical research.

[23]  N. Barton,et al.  The evolution of recombination: removing the limits to natural selection. , 1997, Genetics.

[24]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[25]  A. Nicolas,et al.  Recombination at work for meiosis. , 1998, Current opinion in genetics & development.

[26]  J. Wall,et al.  A comparison of estimators of the population recombination rate. , 2000, Molecular biology and evolution.

[27]  A. Burt PERSPECTIVE: SEX, RECOMBINATION, AND THE EFFICACY OF SELECTION—WAS WEISMANN RIGHT? , 2000, Evolution; international journal of organic evolution.

[28]  A. Jeffreys,et al.  Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex , 2001, Nature Genetics.

[29]  M. Noor,et al.  Chromosomal inversions and the reproductive isolation of species , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[30]  D. Hartl,et al.  Patterns of DNA sequence variation suggest the recent action of positive selection in the janus-ocnus region of Drosophila simulans. , 2001, Genetics.

[31]  L H. Rieseberg,et al.  Chromosomal rearrangements and speciation. , 2001, Trends in ecology & evolution.

[32]  J. Wall,et al.  Why is there so little intragenic linkage disequilibrium in humans? , 2001, Genetical research.

[33]  J. Jaenike Sex Chromosome Meiotic Drive , 2001 .

[34]  M. Lichten,et al.  Meiotic recombination: Breaking the genome to save it , 2001, Current Biology.

[35]  J. David,et al.  Chromosomal inversion polymorphism in Afrotropical populations of Drosophila melanogaster. , 2002, Genetical research.

[36]  C. Wiuf On the minimum number of topologies explaining a sample of DNA sequences. , 2002, Theoretical Population Biology.

[37]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[38]  P. Fearnhead,et al.  A coalescent-based method for detecting and estimating recombination from gene sequences. , 2002, Genetics.

[39]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.

[40]  R. Griffiths,et al.  Bounds on the minimum number of recombination events in a sample history. , 2003, Genetics.

[41]  A. Jeffreys,et al.  Intense and highly localized gene conversion activity in human meiotic crossover hot spots , 2004, Nature Genetics.

[42]  R. Nielsen,et al.  Linkage Disequilibrium as a Signature of Selective Sweeps , 2004, Genetics.

[43]  P. Donnelly,et al.  Comparison of Fine-Scale Recombination Rates in Humans and Chimpanzees , 2005, Science.

[44]  P. Donnelly,et al.  A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome , 2005, Science.

[45]  Mark Kirkpatrick,et al.  Chromosome Inversions, Local Adaptation and Speciation , 2006, Genetics.

[46]  R. Rothstein,et al.  Molecular genetics of recombination , 2007 .

[47]  John Maynard Smith,et al.  The hitch-hiking effect of a favourable gene. , 1974, Genetical research.

[48]  G. McVean,et al.  Estimating Meiotic Gene Conversion Rates From Population Genetic Data , 2007, Genetics.

[49]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[50]  P. O’Reilly,et al.  Confounding between recombination and selection, and the Ped/Pop method for detecting selection. , 2008, Genome research.

[51]  Rob J. Kulathinal,et al.  The Genomics of Speciation in Drosophila: Diversity, Divergence, and Introgression Estimated Using Low-Coverage Genome Sequencing , 2009, PLoS genetics.

[52]  D. Reich,et al.  Sensitive Detection of Chromosomal Segments of Distinct Ancestry in Admixed Populations , 2009, PLoS genetics.

[53]  Asan,et al.  Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude , 2010, Science.

[54]  M. Kirkpatrick How and Why Chromosome Inversions Evolve , 2010, PLoS biology.

[55]  A. Gylfason,et al.  Fine-scale recombination rate differences between sexes, populations and individuals , 2010, Nature.

[56]  Xiaofeng Zhu,et al.  The landscape of recombination in African Americans , 2011, Nature.

[57]  Gabor T. Marth,et al.  Demographic history and rare allele sharing among human populations , 2011, Proceedings of the National Academy of Sciences.

[58]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[59]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[60]  Colin N. Dewey,et al.  Genomic Variation in Natural Populations of Drosophila melanogaster , 2012, Genetics.

[61]  Luay Nakhleh,et al.  The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection , 2012, PLoS genetics.

[62]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[63]  J. M. Comeron,et al.  The Many Landscapes of Recombination in Drosophila melanogaster , 2012, PLoS genetics.

[64]  Andrew H. Chan,et al.  Genome-Wide Fine-Scale Recombination Rate Variation in Drosophila melanogaster , 2012, PLoS genetics.

[65]  D. Petrov,et al.  LDx: Estimation of Linkage Disequilibrium from High-Throughput Pooled Resequencing Data , 2012, PloS one.

[66]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[67]  A. Clark,et al.  Fine-Scale Heterogeneity in Crossover Rate in the garnet-scalloped Region of the Drosophila melanogaster X Chromosome , 2013, Genetics.

[68]  M. Kirkpatrick,et al.  REPRODUCTIVE ISOLATION AND LOCAL ADAPTATION QUANTIFIED FOR A CHROMOSOME INVERSION IN A MALARIA MOSQUITO , 2013, Evolution; international journal of organic evolution.

[69]  A. Futschik,et al.  A Fast Estimate for the Population Recombination Rate Based on Regression , 2013, Genetics.

[70]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[71]  R. Durbin,et al.  Inferring human population size and separation history from multiple genome sequences , 2014, Nature Genetics.

[72]  Lex E. Flagel,et al.  Speciation and Introgression between Mimulus nasutus and Mimulus guttatus , 2013, bioRxiv.

[73]  A. Rogers How Population Growth Affects Linkage Disequilibrium , 2013, Genetics.

[74]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[75]  Matthew W. Hahn,et al.  Soft Shoulders Ahead: Spurious Signatures of Soft and Partial Selective Sweeps Result from Linked Hard Sweeps , 2015, Genetics.

[76]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[77]  M. Daly,et al.  LD Score regression distinguishes confounding from polygenicity in genome-wide association studies , 2014, Nature Genetics.

[78]  N. Kleckner,et al.  Recombination, Pairing, and Synapsis of Homologs during Meiosis. , 2015, Cold Spring Harbor perspectives in biology.

[79]  Yun-Xin Fu,et al.  Exploring Population Size Changes Using SNP Frequency Spectra , 2015, Nature Genetics.

[80]  Russell B. Corbett-Detig,et al.  The Drosophila Genome Nexus: A Population Genomic Resource of 623 Drosophila melanogaster Genomes, Including 197 from a Single Ancestral Range Population , 2015, Genetics.

[81]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[83]  Haipeng Li,et al.  New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era , 2016, G3: Genes, Genomes, Genetics.

[84]  Jerome Kelleher,et al.  Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes , 2015, bioRxiv.

[85]  Andrew D. Kern,et al.  discoal: flexible coalescent simulations with selection , 2016, bioRxiv.

[86]  Tina T. Hu,et al.  A Genomic Map of the Effects of Linked Selection in Drosophila , 2014, PLoS genetics.

[87]  Danny E. Miller,et al.  Third Chromosome Balancer Inversions Disrupt Protein-Coding Genes and Influence Distal Recombination Events in Drosophila melanogaster , 2016, G3: Genes, Genomes, Genetics.

[88]  Yun S. Song,et al.  Robust and scalable inference of population history from hundreds of unphased whole genomes , 2016, Nature Genetics.

[89]  K. Broman,et al.  Recombination rate variation in mice from an isolated island , 2017, Molecular ecology.

[90]  Nadia D. Singh,et al.  Variation in Recombination Rate: Adaptive or Not? , 2017, Trends in genetics : TIG.

[91]  Yun S. Song,et al.  A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks , 2018, bioRxiv.

[92]  Daniel R. Schrider,et al.  Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia , 2017, bioRxiv.

[93]  S. Schaeffer,et al.  Extensive recombination suppression and chromosome-wide differentiation of a segregation distorter in Drosophila , 2018 .

[94]  Daniel R. Schrider,et al.  diploS/HIC: An Updated Approach to Classifying Selective Sweeps , 2018, G3: Genes, Genomes, Genetics.

[95]  Daniel R. Schrider,et al.  Supervised Machine Learning for Population Genetics: A New Paradigm , 2018, Trends in genetics : TIG.

[96]  Daniel L. Powell,et al.  Natural selection interacts with recombination to shape the evolution of hybrid genomes , 2018, Science.

[97]  M. Hahn Molecular Population Genetics , 2018, Genetics.

[98]  Daniel R. Schrider,et al.  The Unreasonable Effectiveness of Convolutional Neural Networks in Population Genetic Inference , 2018, bioRxiv.