A hidden Markov model approach for simultaneously estimating local ancestry and admixture time using next generation sequence data in samples of arbitrary ploidy

Admixture—the mixing of genomes from divergent populations—is increasingly appreciated as a central process in evolution. To characterize and quantify patterns of admixture across the genome, a number of methods have been developed for local ancestry inference. However, existing approaches have a number of shortcomings. First, all local ancestry inference methods require some prior assumption about the expected ancestry tract lengths. Second, existing methods generally require genotypes, which is not feasible to obtain for many next-generation sequencing projects. Third, many methods assume samples are diploid, however a wide variety of sequencing applications will fail to meet this assumption. To address these issues, we introduce a novel hidden Markov model for estimating local ancestry that models the read pileup data, rather than genotypes, is generalized to arbitrary ploidy, and can estimate the time since admixture during local ancestry inference. We demonstrate that our method can simultaneously estimate the time since admixture and local ancestry with good accuracy, and that it performs well on samples of high ploidy—i.e. 100 or more chromosomes. As this method is very general, we expect it will be useful for local ancestry inference in a wider variety of populations than what previously has been possible. We then applied our method to pooled sequencing data derived from populations of Drosophila melanogaster on an ancestry cline on the east coast of North America. We find that regions of local recombination rates are negatively correlated with the proportion of African ancestry, suggesting that selection against foreign ancestry is the least efficient in low recombination regions. Finally we show that clinal outlier loci are enriched for genes associated with gene regulatory functions, consistent with a role of regulatory evolution in ecological adaptation of admixed D. melanogaster populations. Our results illustrate the potential of local ancestry inference for elucidating fundamental evolutionary processes. Author Summary When divergent populations hybridize, their offspring obtain portions of their genomes from each parent population. Although the average ancestry proportion in each descendant is equal to the proportion of ancestors from each of the ancestral populations, the contribution of each ancestry type is variable across the genome. Estimating local ancestry within admixed individuals is a fundamental goal for evolutionary genetics, and here we develop a method for doing this that circumvents many of the problems associated with existing methods. Briefly, our method can use short read data, rather than genotypes and can be applied to samples with any number of chromosomes. Furthermore, our method simultaneously estimates local ancestry and the number of generations since admixture—the time that the two ancestral populations first encountered each other. Finally, in applying our method to data from an admixture zone between ancestral populations of Drosophila melanogaster, we find many lines of evidence consistent with natural selection operating to against the introduction of foreign ancestry into populations of one predominant ancestry type. Because of the generality of this method, we expect that it will be useful for a wide variety of existing and ongoing research projects.

[1]  Kevin R. Thornton,et al.  Approximate Bayesian Inference Reveals Evidence for a Recent, Severe Bottleneck in a Netherlands Population of Drosophila melanogaster , 2006, Genetics.

[2]  S. Gabriel,et al.  Assessing the impact of population stratification on genetic association studies , 2004, Nature Genetics.

[3]  Chung-I Wu,et al.  Incipient speciation by sexual isolation in Drosophila: Concurrent evolution at multiple loci , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  D. Reich,et al.  The landscape of Neandertal ancestry in present-day humans , 2014, Nature.

[5]  J. Pool The Mosaic Ancestry of the Drosophila Genetic Reference Panel and the D. melanogaster Reference Genome Reveals a Network of Epistatic Fitness Interactions , 2015, Molecular biology and evolution.

[6]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[7]  R. Nielsen,et al.  Inference of Historical Changes in Migration Rate From the Lengths of Migrant Tracts , 2009, Genetics.

[8]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[9]  C. Schlötterer,et al.  Genetic differentiation between American and European Drosophila melanogaster populations could be attributed to admixture of African alleles. , 2003, Molecular biology and evolution.

[10]  R. Nielsen,et al.  Evidence for archaic adaptive introgression in humans , 2015, Nature Reviews Genetics.

[11]  D. Reich,et al.  The Date of Interbreeding between Neandertals and Modern Humans , 2012, PLoS genetics.

[12]  R. Nielsen,et al.  The Genetic Cost of Neanderthal Introgression , 2015, Genetics.

[13]  Dmitri A. Petrov,et al.  Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster , 2012, PloS one.

[14]  G. Coop,et al.  Speciation and Introgression between Mimulus nasutus and Mimulus guttatus , 2013, bioRxiv.

[15]  J. Powell,et al.  Drosophila Inversion Polymorphism , 1992 .

[16]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[17]  E. Halperin,et al.  Estimating Local Ancestry in Admixed Populations , 2022 .

[18]  Charles H. Langley,et al.  Circumventing Heterozygosity: Sequencing the Amplified Genome of a Single Haploid Drosophila melanogaster Embryo , 2011, Genetics.

[19]  Daniel L. Powell,et al.  High-resolution mapping reveals hundreds of genetic incompatibilities in hybridizing fish species , 2014, eLife.

[20]  Philip L. F. Johnson,et al.  A Draft Sequence of the Neandertal Genome , 2010, Science.

[21]  R. Nielsen,et al.  The Lengths of Admixture Tracts , 2014, Genetics.

[22]  S. Nuzhdin,et al.  Postmating reproductive barriers contribute to the incipient sexual isolation of the United States and Caribbean Drosophila melanogaster , 2015, Ecology and evolution.

[23]  Daniel R. Schrider,et al.  Parallel Evolution of Copy-Number Variation across Continents in Drosophila melanogaster. , 2016, Molecular biology and evolution.

[24]  G. Coop,et al.  The Strength of Selection against Neanderthal Introgression , 2015, bioRxiv.

[25]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[26]  Robert Kofler,et al.  Sequencing of Pooled DNA Samples (Pool-Seq) Uncovers Complex Dynamics of Transposable Element Insertions in Drosophila melanogaster , 2012, PLoS genetics.

[27]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[28]  Colin N. Dewey,et al.  Genomic Variation in Natural Populations of Drosophila melanogaster , 2012, Genetics.

[29]  Matt Jones,et al.  SELAM: simulation of epistasis and local adaptation during admixture with mate choice , 2016, Bioinform..

[30]  W. Stephan,et al.  Demographic Inference Reveals African and European Admixture in the North American Drosophila melanogaster Population , 2013, Genetics.

[31]  D. Petrov,et al.  Genomic Evidence of Rapid and Stable Adaptive Oscillations over Seasonal Time Scales in Drosophila , 2013, PLoS genetics.

[32]  Pedro C. Avila,et al.  Fast and accurate inference of local ancestry in Latino populations , 2012, Bioinform..

[33]  C. Schlötterer,et al.  Genome-wide patterns of latitudinal differentiation among populations of Drosophila melanogaster from North America , 2012, Molecular ecology.

[34]  D. Petrov,et al.  Secondary contact and local adaptation contribute to genome-wide patterns of clinal variation in Drosophila melanogaster , 2014, bioRxiv.

[35]  Russell B. Corbett-Detig,et al.  Population Genomics of Sub-Saharan Drosophila melanogaster: African Diversity and Non-African Admixture , 2012, PLoS genetics.

[36]  Kevin R. Thornton,et al.  Multilocus patterns of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations. , 2005, Genome research.

[37]  J. M. Comeron,et al.  The Many Landscapes of Recombination in Drosophila melanogaster , 2012, PLoS genetics.

[38]  S. Gravel Population Genetics Models of Local Ancestry , 2012, Genetics.

[39]  J. Losos,et al.  Genetic variation increases during biological invasion by a Cuban lizard , 2004, Nature.

[40]  T. Korneliussen,et al.  Estimating Individual Admixture Proportions from Next Generation Sequencing Data , 2013, Genetics.

[41]  Paul Marjoram,et al.  Fast "coalescent" simulation , 2006, BMC Genetics.

[42]  Jake K. Byrnes,et al.  Genomic Ancestry of North Africans Supports Back-to-Africa Migrations , 2012, PLoS genetics.

[43]  D. Falush,et al.  A Genetic Atlas of Human Admixture History , 2014, Science.

[44]  Chung-I Wu,et al.  INCIPIENT SPECIATION BY SEXUAL ISOLATION IN DROSOPHILA MELANOGASTER: VARIATION IN MATING PREFERENCE AND CORRELATION BETWEEN SEXES , 1997, Evolution; international journal of organic evolution.

[45]  C. Bustamante,et al.  RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. , 2013, American journal of human genetics.

[46]  C. Wu,et al.  Incipient speciation by sexual isolation in Drosophila melanogaster: extensive genetic divergence without reinforcement. , 1997, Genetics.

[47]  R. Gibbs,et al.  Natural variation in genome architecture among 205 Drosophila melanogaster Genetic Reference Panel lines. , 2014, Genome research.

[48]  J. True,et al.  African Morphology, Behavior and Phermones Underlie Incipient Sexual Isolation between US and Caribbean Drosophila melanogaster , 2008, Evolution; international journal of organic evolution.

[49]  Kevin R. Thornton,et al.  The Drosophila melanogaster Genetic Reference Panel , 2012, Nature.

[50]  M. Kreitman,et al.  Molecular analysis of an allozyme cline: alcohol dehydrogenase in Drosophila melanogaster on the east coast of North America. , 1993, Genetics.

[51]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[52]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[53]  J. David,et al.  Chromosomal inversion polymorphism in Afrotropical populations of Drosophila melanogaster. , 2002, Genetical research.

[54]  C. Schlötterer,et al.  Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles , 2012, Molecular ecology.

[55]  T. Mackay,et al.  Mutations in many genes affect aggressive behavior in Drosophila melanogaster , 2009, BMC Biology.

[56]  I. Roldán‐Ruiz,et al.  Linked vs. unlinked markers: multilocus microsatellite haplotype‐sharing as a tool to estimate gene flow and introgression , 2006, Molecular ecology.

[57]  Joseph K. Pickrell,et al.  Inferring Admixture Histories of Human Populations Using Linkage Disequilibrium , 2012, Genetics.

[58]  Russell B. Corbett-Detig,et al.  Genetic Incompatibilities are Widespread Within Species , 2013, Nature.

[59]  H. A. Orr,et al.  PATTERNS OF SPECIATION IN DROSOPHILA , 1989, Evolution; international journal of organic evolution.

[60]  H. Ostrer,et al.  The History of African Gene Flow into Southern Europeans, Levantines, and Jews , 2011, PLoS genetics.

[61]  Nicholas H. Barton,et al.  The Relative Rates of Evolution of Sex Chromosomes and Autosomes , 1987, The American Naturalist.

[62]  Pedro C. Avila,et al.  Analysis of Latino populations from GALA and MEC studies reveals genomic loci with biased local ancestry estimation , 2013, Bioinform..

[63]  E. Lander,et al.  Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease , 2003, Nature Genetics.

[64]  J. Goudet,et al.  Genomic Evidence for Adaptive Inversion Clines in Drosophila melanogaster. , 2016, Molecular biology and evolution.

[65]  M. Kronforst,et al.  MULTILOCUS ANALYSES OF ADMIXTURE AND INTROGRESSION AMONG HYBRIDIZING HELICONIUS BUTTERFLIES , 2006, Evolution; international journal of organic evolution.

[66]  W. Stephan,et al.  Inferring the Demographic History and Rate of Adaptive Substitution in Drosophila , 2006, PLoS genetics.

[67]  G. E. Carney,et al.  Socially-Responsive Gene Expression in Male Drosophila melanogaster Is Influenced by the Sex of the Interacting Partner , 2011, Genetics.

[68]  D. Begun,et al.  Evidence of Spatially Varying Selection Acting on Four Chromatin-Remodeling Loci in Drosophila melanogaster , 2008, Genetics.

[69]  N. Barton,et al.  The distribution of surviving blocks of an ancestral genome. , 2003, Theoretical population biology.

[70]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[71]  Robert Kofler,et al.  Gowinda: unbiased analysis of gene set enrichment for genome-wide association studies , 2012, Bioinform..

[72]  C. Schlötterer,et al.  Inference of chromosomal inversion dynamics from Pool-Seq data in natural and laboratory populations of Drosophila melanogaster , 2013, Molecular ecology.

[73]  Jerry A. Coyne,et al.  Genetics and speciation , 1992, Nature.

[74]  Tanya M. Teslovich,et al.  The Metabochip, a Custom Genotyping Array for Genetic Studies of Metabolic, Cardiovascular, and Anthropometric Traits , 2012, PLoS genetics.

[75]  Gary K. Chen,et al.  Fast and flexible simulation of DNA sequence data. , 2008, Genome research.

[76]  M. Hufford,et al.  The Genomic Signature of Crop-Wild Introgression in Maize , 2012, PLoS genetics.

[77]  Rob J. Kulathinal,et al.  The Genomics of Speciation in Drosophila: Diversity, Divergence, and Introgression Estimated Using Low-Coverage Genome Sequencing , 2009, PLoS genetics.

[78]  J. Al-Aama,et al.  A common Greenlandic TBC1D4 variant confers muscle insulin resistance and type 2 diabetes , 2014, Nature.

[79]  Charis Cardeno,et al.  Sequence-Based Detection and Breakpoint Assembly of Polymorphic Inversions , 2012, Genetics.

[80]  L. Rieseberg,et al.  Major Ecological Transitions in Wild Sunflowers Facilitated by Hybridization , 2003, Science.

[81]  J. Oakeshott,et al.  ALCOHOL DEHYDROGENASE AND GLYCEROL‐3‐PHOSPHATE DEHYDROGENASE CLINES IN DROSOPHILA MELANOGASTER ON DIFFERENT CONTINENTS , 1982, Evolution; international journal of organic evolution.

[82]  Bryan D. Kolaczkowski,et al.  Genomic Differentiation Between Temperate and Tropical Australian Populations of Drosophila melanogaster , 2011, Genetics.

[83]  D. Petrov,et al.  Secondary contact and local adaptation contribute to genome‐wide patterns of clinal variation in Drosophila melanogaster , 2016, Molecular ecology.

[84]  P. Smouse,et al.  genalex 6: genetic analysis in Excel. Population genetic software for teaching and research , 2006 .

[85]  J. Pool,et al.  History and Structure of Sub-Saharan Populations of Drosophila melanogaster , 2006, Genetics.

[86]  Eran Halperin,et al.  Inference of locus-specific ancestry in closely related populations , 2009, Bioinform..

[87]  Matteo Fumagalli,et al.  ngsTools: methods for population genetics analyses from next-generation sequencing data , 2014, Bioinform..

[88]  The Strength of Selection Against Neanderthal Introgression , 2015 .

[89]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[90]  Russell B. Corbett-Detig,et al.  The Drosophila Genome Nexus: A Population Genomic Resource of 623 Drosophila melanogaster Genomes, Including 197 from a Single Ancestral Range Population , 2015, Genetics.

[91]  Swapan Mallick,et al.  The genomic landscape of Neanderthal ancestry in present-day humans. , 2016 .

[92]  Josephine A. Reinhardt,et al.  Parallel Geographic Variation in Drosophila melanogaster , 2014, Genetics.

[93]  Ryan D. Hernandez,et al.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data , 2009, PLoS genetics.

[94]  Jun Wang,et al.  SNP Calling, Genotype Calling, and Sample Allele Frequency Estimation from New-Generation Sequencing Data , 2012, PloS one.

[95]  Russell B. Corbett-Detig,et al.  Population Genomics of Inversion Polymorphisms in Drosophila melanogaster , 2012, PLoS genetics.

[96]  D. E. Roberts,et al.  The Upper Tail Probabilities of Spearman's Rho , 1975 .

[97]  J. Kiefer,et al.  Sequential minimax search for a maximum , 1953 .

[98]  Simon H. Martin,et al.  Butterfly genome reveals promiscuous exchange of mimicry adaptations among species , 2012, Nature.

[99]  T. Hansen,et al.  Uncovering the Genetic History of the Present-Day Greenlandic Population , 2014, American journal of human genetics.

[100]  J. True,et al.  X‐AUTOSOME INCOMPATIBILITIES IN DROSOPHILA MELANOGASTER: TESTS OF HALDANE'S RULE AND GEOGRAPHIC PATTERNS WITHIN SPECIES , 2010, Evolution; international journal of organic evolution.

[101]  M. Daly,et al.  Methods for high-density admixture mapping of disease genes. , 2004, American journal of human genetics.