Adaptive archaic introgression of copy number variants and the discovery of previously unknown human genes

Adaptive archaic hominin genes As they migrated out of Africa and into Europe and Asia, anatomically modern humans interbred with archaic hominins, such as Neanderthals and Denisovans. The result of this genetic introgression on the recipient populations has been of considerable interest, especially in cases of selection for specific archaic genetic variants. Hsieh et al. characterized adaptive structural variants and copy number variants that are likely targets of positive selection in Melanesians. Focusing on population-specific regions of the genome that carry duplicated genes and show an excess of amino acid replacements provides evidence for one of the mechanisms by which genetic novelty can arise and result in differentiation between human genomes. Science, this issue p. eaax2083 Melanesians carry adaptive DNA variants derived from archaic hominins. INTRODUCTION Characterizing genetic variants underlying local adaptations in human populations is one of the central goals of evolutionary research. Most studies have focused on adaptive single-nucleotide variants that either arose as new beneficial mutations or were introduced after interbreeding with our now-extinct relatives, including Neanderthals and Denisovans. The adaptive role of copy number variants (CNVs), another well-known form of genomic variation generated through deletions or duplications that affect more base pairs in the genome, is less well understood, despite evidence that such mutations are subject to stronger selective pressures. RATIONALE This study focuses on the discovery of introgressed and adaptive CNVs that have become enriched in specific human populations. We combine whole-genome CNV calling and population genetic inference methods to discover CNVs and then assess signals of selection after controlling for demographic history. We examine 266 publicly available modern human genomes from the Simons Genome Diversity Project and genomes of three ancient hominins—a Denisovan, a Neanderthal from the Altai Mountains in Siberia, and a Neanderthal from Croatia. We apply long-read sequencing methods to sequence-resolve complex CNVs of interest specifically in the Melanesians—an Oceanian population distributed from Papua New Guinea to as far east as the islands of Fiji and known to harbor some of the greatest amounts of Neanderthal and Denisovan ancestry. RESULTS Consistent with the hypothesis of archaic introgression outside Africa, we find a significant excess of CNV sharing between modern non-African populations and archaic hominins (P = 0.039). Among Melanesians, we observe an enrichment of CNVs with potential signals of positive selection (n = 37 CNVs), of which 19 CNVs likely introgressed from archaic hominins. We show that Melanesian-stratified CNVs are significantly associated with signals of positive selection (P = 0.0323). Many map near or within genes associated with metabolism (e.g., ACOT1 and ACOT2), development and cell cycle or signaling (e.g., TNFRSF10D and CDK11A and CDK11B), or immune response (e.g., IFNLR1). We characterize two of the largest and most complex CNVs on chromosomes 16p11.2 and 8p21.3 that introgressed from Denisovans and Neanderthals, respectively, and are absent from most other human populations. At chromosome 16p11.2, we sequence-resolve a large duplication of >383 thousand base pairs (kbp) that originated from Denisovans and introgressed into the ancestral Melanesian population 60,000 to 170,000 years ago. This large duplication occurs at high frequency (>79%) in diverse Melanesian groups, shows signatures of positive selection, and maps adjacent to Homo sapiens–specific duplications that predispose to rearrangements associated with autism. On chromosome 8p21.3, we identify a Melanesian haplotype that carries two CNVs, a ~6-kbp deletion, and a ~38-kbp duplication, with a Neanderthal origin and that introgressed into non-Africans 40,000 to 120,000 years ago. This CNV haplotype occurs at high frequency (44%) and shows signals consistent with a partial selective sweep in Melanesians. Using long-read sequencing genomic and transcriptomic data, we reconstruct the structure and complex evolutionary history for these two CNVs and discover previously undescribed duplicated genes (TNFRSF10D1, TNFRSF10D2, and NPIPB16) that show an excess of amino acid replacements consistent with the action of positive selection. CONCLUSION Our results suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation that is absent from current reference genomes. Large adaptive-introgressed CNVs at chromosomes 8p21.3 and 16p11.2 in Melanesians. The magnifying glasses highlight structural differences between the archaic (top) and reference (bottom) genomes. Neanderthal (red) and Denisovan (blue) haplotypes encompassing large CNVs occur at high frequencies in Melanesians (44 and 79%, respectively) but are absent (black) in all non-Melanesians. These CNVs create positively selected genes (TNFRSF10D1, TNFRSF10D2, and NPIPB16) that are absent from the reference genome. Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.

[1]  T. Simonson,et al.  Seq-ing Higher Ground: Functional Investigation of Adaptive Variation Associated With High-Altitude Adaptation , 2020, Frontiers in Genetics.

[2]  J. Kidd,et al.  Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2 , 2020, Genes.

[3]  Alexis C. Komor,et al.  Rewriting Human History and Empowering Indigenous Communities with Genome Editing Tools , 2020, Genes.

[4]  T. Fenton,et al.  The APOBEC3 genes and their role in cancer: insights from human papillomavirus. , 2019, Journal of molecular endocrinology.

[5]  Evan E. Eichler,et al.  Characterizing the Major Structural Variant Alleles of the Human Genome , 2019, Cell.

[6]  Evan E. Eichler,et al.  Long-read sequence and assembly of segmental duplications , 2018, Nature Methods.

[7]  Trygve E Bakken,et al.  Neurodevelopmental disease genes implicated by de novo mutation and copy number variation morbidity , 2018, Nature Genetics.

[8]  A. Firth,et al.  An upstream protein-coding region in enteroviruses modulates virus infection in gut epithelial cells , 2018, Nature Microbiology.

[9]  Alex A. Pollen,et al.  Transcriptional fates of human-specific segmental duplications in brain , 2018, Genome research.

[10]  David Haussler,et al.  High-resolution comparative analysis of great ape genomes , 2018, Science.

[11]  S. Rasmussen,et al.  Physiological and Genetic Adaptations to Diving in Sea Nomads , 2018, Cell.

[12]  M. Hiller,et al.  Evolution and cell-type specificity of human-specific genes preferentially expressed in progenitors of fetal neocortex , 2017, bioRxiv.

[13]  E. Eichler,et al.  A high-coverage Neandertal genome from Vindija Cave in Croatia , 2017, Science.

[14]  C. Tyler-Smith,et al.  A Neolithic expansion, but strong genetic structure, in the independent history of New Guinea , 2017, Science.

[15]  D. Cohen,et al.  Deactivating Fatty Acids: Acyl-CoA Thioesterase-Mediated Control of Lipid Metabolism , 2017, Trends in Endocrinology & Metabolism.

[16]  Aylwyn Scally,et al.  The mutation rate in human evolution and demographic inference. , 2016, Current opinion in genetics & development.

[17]  Zev N. Kronenberg,et al.  Interchromosomal core duplicons drive both evolutionary instability and disease susceptibility of the Chromosome 8p23.1 region , 2016, Genome research.

[18]  Søren Brunak,et al.  A genomic history of Aboriginal Australia , 2016, Nature.

[19]  Jonathan Scott Friedlaender,et al.  Ancient Genomics and the Peopling of the Southwest Pacific , 2016, Nature.

[20]  Fred H. Gage,et al.  Emergence of a Homo sapiens-specific gene family and chromosome 16p11.2 CNV susceptibility , 2016, Nature.

[21]  Yun S. Song,et al.  The Simons Genome Diversity Project: 300 genomes from 142 diverse populations , 2016, Nature.

[22]  Jonathan Scott Friedlaender,et al.  Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals , 2016, Science.

[23]  S. Gravel,et al.  Computationally Efficient Composite Likelihood Statistics for Demographic Inference. , 2016, Molecular biology and evolution.

[24]  Liran Carmel,et al.  Archaic Adaptive Introgression in TBX15/WARS2 , 2015, bioRxiv.

[25]  Mark Yandell,et al.  Wham: Identifying Structural Variants of Biological Consequence , 2015, PLoS Comput. Biol..

[26]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[27]  Anders Albrechtsen,et al.  Greenlandic Inuit show genetic signatures of diet and climate adaptation , 2015, Science.

[28]  Bradley P. Coe,et al.  Global diversity, population stratification, and selection of human copy-number variation , 2015, Science.

[29]  K. Veeramah,et al.  Whole-genome sequence analyses of Western Central African Pygmy hunter-gatherers reveal a complex demographic history and identify candidate genes under positive natural selection , 2015, bioRxiv.

[30]  P. Hou,et al.  DEFA gene variants associated with IgA nephropathy in a Chinese population , 2015, Genes and Immunity.

[31]  R. Handsaker,et al.  Large multi-allelic copy number variations in humans , 2015, Nature Genetics.

[32]  Simon H. Martin,et al.  Evaluating the Use of ABBA–BABA Statistics to Locate Introgressed Loci , 2014, bioRxiv.

[33]  Asan,et al.  Altitude adaptation in Tibet caused by introgression of Denisovan-like DNA , 2014, Nature.

[34]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[35]  Philip L. F. Johnson,et al.  The complete genome sequence of a Neandertal from the Altai Mountains , 2013, Nature.

[36]  Arcadi Navarro,et al.  Great ape genetic diversity and population history , 2013, Nature.

[37]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[38]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[39]  Adrian W. Briggs,et al.  A High-Coverage Genome Sequence from an Archaic Denisovan Individual , 2012, Science.

[40]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[41]  Kevin E. Langergraber,et al.  Generation times in wild chimpanzees and gorillas suggest earlier divergence times in great ape and human evolution , 2012, Proceedings of the National Academy of Sciences.

[42]  M. Hammer,et al.  A haplotype at STAT2 Introgressed from neanderthals and serves as a candidate of positive selection in Papua New Guinea. , 2012, American journal of human genetics.

[43]  Kenneth K. Kidd,et al.  Structural Diversity and African Origin of the 17q21.31 Inversion Polymorphism , 2012, Nature Genetics.

[44]  Albert J. Vilella,et al.  Insights into hominid evolution from the gorilla genome sequence , 2012, Nature.

[45]  M. Hammer,et al.  A Haplotype at STAT 2 Introgressed from Neanderthals and Serves as a Candidate of Positive Selection in Papua New Guinea , 2012 .

[46]  C. Tyler-Smith,et al.  A Worldwide Analysis of Beta-Defensin Copy Number Variation Suggests Recent Selection of a High-Expressing DEFB103 Gene Copy in East Asia , 2011, Human mutation.

[47]  Philip L. F. Johnson,et al.  Genetic history of an archaic hominin group from Denisova Cave in Siberia , 2010, Nature.

[48]  Peter H. Sudmant,et al.  Diversity of Human Copy Number Variation and Multicopy Genes , 2010, Science.

[49]  Faraz Hach,et al.  mrsFAST: a cache-oblivious algorithm for short-read mapping , 2010, Nature Methods.

[50]  David C. Schwartz,et al.  A large, complex structural polymorphism at 16p12.1 underlies microdeletion disease risk , 2010, Nature Genetics.

[51]  Asan,et al.  Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude , 2010, Science.

[52]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[53]  J. Goedert,et al.  APOBEC3B deletion and risk of HIV-1 acquisition. , 2009, The Journal of infectious diseases.

[54]  Ryan D. Hernandez,et al.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data , 2009, PLoS genetics.

[55]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[56]  Zhaoshi Jiang,et al.  Characterization of six human disease-associated inversion polymorphisms , 2009, Human molecular genetics.

[57]  Gary K. Chen,et al.  Fast and flexible simulation of DNA sequence data. , 2008, Genome research.

[58]  Fengtang Yang,et al.  Adaptive evolution of UGT2B17 copy-number variation. , 2008, American journal of human genetics.

[59]  Joshua M. Korn,et al.  Association between microdeletion and microduplication at 16p11.2 and autism. , 2008, The New England journal of medicine.

[60]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[61]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[62]  Fernando A. Villanea,et al.  Diet and the evolution of human amylase gene copy number variation , 2007, Nature Genetics.

[63]  M. Nei,et al.  MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. , 2007, Molecular biology and evolution.

[64]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[65]  E. Eichler,et al.  Population Stratification of a Common APOBEC Gene Deletion Polymorphism , 2007, PLoS genetics.

[66]  Holly M. Mortensen,et al.  Convergent adaptation of human lactase persistence in Africa and Europe , 2007, Nature Genetics.

[67]  Condensed-matter physics: Up the magnetic pressure , 2006, Nature.

[68]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[69]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[70]  Alan M. Moses,et al.  Widespread Discordance of Gene Trees with Species Tree in Drosophila: Evidence for Incomplete Lineage Sorting , 2006, PLoS genetics.

[71]  Peer Bork,et al.  PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments , 2006, Nucleic Acids Res..

[72]  Vincent Plagnol,et al.  Possible Ancestral Structure in Human Populations , 2006, PLoS genetics.

[73]  P. Nilsson-ehle,et al.  Lipoprotein composition and serum cholesterol ester fatty acids in nonwesternized melanesians , 1996, Lipids.

[74]  P. Khavari,et al.  Use of human tissue to assess the oncogenic activity of melanoma-associated mutations , 2005, Nature Genetics.

[75]  H. Stefánsson,et al.  A common inversion under selection in Europeans , 2005, Nature Genetics.

[76]  Pardis C Sabeti,et al.  Detecting recent positive selection in the human genome from haplotype structure , 2002, Nature.

[77]  Evan E. Eichler,et al.  Positive selection of a gene family during the emergence of humans and African apes , 2001, Nature.

[78]  M W Feldman,et al.  Recent common ancestry of human Y chromosomes: evidence from DNA sequence data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[79]  P. Katzmarzyk,et al.  Climatic influences on human body size and proportions: ecological adaptations and secular trends. , 1998, American journal of physical anthropology.

[80]  Z. Yang,et al.  Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. , 1998, Molecular biology and evolution.

[81]  V. Dixit,et al.  TRUNDD, a new member of the TRAIL receptor family that antagonizes TRAIL signalling , 1998, FEBS letters.

[82]  D. Ledbetter,et al.  Fluorescence in situ hybridization with Alu and L1 polymerase chain reaction probes for rapid characterization of human chromosomes in hybrid cell lines. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[83]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[84]  A. Hill,et al.  High frequencies of α-thalassaemia are the result of natural selection by malaria , 1986, Nature.

[85]  A. Hill,et al.  High frequencies of alpha-thalassaemia are the result of natural selection by malaria. , 1986, Nature.

[86]  B. Weir,et al.  ESTIMATING F‐STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE , 1984, Evolution; international journal of organic evolution.

[87]  C. Raman Diffraction by Molecular Clusters and the Quantum Structure of Light. , 1922, Nature.