Population Structure, Stratification and Introgression of Human Structural Variation in the HGDP

Structural variants contribute substantially to genetic diversity and are important evolutionarily and medically, yet are still understudied. Here, we present a comprehensive analysis of deletions, duplications, inversions and non-reference unique insertions in the Human Genome Diversity Project (HGDP-CEPH) panel, a high-coverage dataset of 910 samples from 54 diverse worldwide populations. We identify in total 61,801 structural variants, of which 61% are novel. Some reach high frequency and are private to continental groups or even individual populations, including a deletion in the maltase-glucoamylase gene MGAM, involved in starch digestion, in the South American Karitiana and a deletion in the Central African Mbuti in SIGLEC5, potentially increasing susceptibility to autoimmune diseases. We discover a dynamic range of copy number expansions and find cases of regionally-restricted runaway duplications, for example, 18 copies near the olfactory receptor OR7D2 in East Asia and in the clinically-relevant HCAR2 in Central Asia. We identify highly-stratified putatively introgressed variants from Neanderthals or Denisovans, some of which, like a deletion within AQR in Papuans, are almost fixed in individual populations. Finally, by de novo assembly of 25 genomes using linked-read sequencing we discover 1631 breakpoint-resolved unique insertions, in aggregate accounting for 1.9 Mb of sequence absent from the GRCh38 reference. These insertions show population structure and some reside in functional regions, illustrating the limitation of a single human reference and the need for high-quality genomes from diverse populations to fully discover and understand human genetic variation.

[1]  Swapan Mallick,et al.  Insights into human genetic variation and population history from 929 diverse genomes , 2019, Science.

[2]  Hannes P. Eggertsson,et al.  GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs , 2019, Nature Communications.

[3]  Leon Di Stefano,et al.  Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software , 2019, Nature Communications.

[4]  Y. Kamatani,et al.  Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing , 2019, Genome Biology.

[5]  Scott M. Williams,et al.  The Missing Diversity in Human Genetic Studies , 2019, Cell.

[6]  Evan E. Eichler,et al.  Characterizing the Major Structural Variant Alleles of the Human Genome , 2019, Cell.

[7]  Y. van Kooyk,et al.  Modulation of Immune Tolerance via Siglec-Sialic Acid Interactions , 2018, Front. Immunol..

[8]  David B. Witonsky,et al.  The genetic prehistory of the Andean highlands 7000 years BP though European contact , 2018, Science Advances.

[9]  Rachel M. Sherman,et al.  Assembly of a pan-genome from deep sequencing of 910 humans of African descent , 2018, Nature Genetics.

[10]  Can Alkan,et al.  Discovery of tandem and interspersed segmental duplications using high throughput sequencing , 2018, bioRxiv.

[11]  William Jones,et al.  Variation graph toolkit improves read mapping by representing genetic variation in the reference , 2018, Nature Biotechnology.

[12]  Pui-Yan Kwok,et al.  De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations , 2018, Nature Communications.

[13]  J. Akey,et al.  Analysis of Human Sequence Data Reveals Two Pulses of Archaic Denisovan Admixture , 2018, Cell.

[14]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[15]  E. Eichler,et al.  A high-coverage Neandertal genome from Vindija Cave in Croatia , 2017, Science.

[16]  M. Cragg,et al.  Therapeutic Antibodies: What Have We Learnt from Targeting CD20 and Where Are We Going? , 2017, Front. Immunol..

[17]  Ryan L. Collins,et al.  Multi-platform discovery of haplotype-resolved structural variation in human genomes , 2017, bioRxiv.

[18]  E. Miska,et al.  The Helicase Aquarius/EMB-4 Is Required to Overcome Intronic Barriers to Allow Nuclear RNAi Pathways to Heritably Silence Transcription , 2017, Developmental cell.

[19]  Bernat Gel,et al.  karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data , 2017, bioRxiv.

[20]  N. Weisenfeld,et al.  Direct determination of diploid genome sequences , 2016, bioRxiv.

[21]  R. Durbin,et al.  Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly , 2016, bioRxiv.

[22]  S. Offermanns Hydroxy-Carboxylic Acid Receptor Actions in Metabolism , 2017, Trends in Endocrinology & Metabolism.

[23]  Fengtang Yang,et al.  Fluorescence In Situ Hybridization onto DNA Fibres Generated Using Molecular Combing , 2017 .

[24]  F. Balloux,et al.  Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast , 2016, Nature Communications.

[25]  Yun S. Song,et al.  The Simons Genome Diversity Project: 300 genomes from 142 diverse populations , 2016, Nature.

[26]  Xiaoyu Chen,et al.  Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications , 2016, Bioinform..

[27]  John Huddleston,et al.  An Incomplete Understanding of Human Genetic Variation , 2016, Genetics.

[28]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[29]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[30]  Bradley P. Coe,et al.  Global diversity, population stratification, and selection of human copy-number variation , 2015, Science.

[31]  R. Handsaker,et al.  Large multi-allelic copy number variations in humans , 2015, Nature Genetics.

[32]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[33]  P. Simarro,et al.  Epidemiology of human African trypanosomiasis , 2014, Clinical epidemiology.

[34]  Asan,et al.  Altitude adaptation in Tibet caused by introgression of Denisovan-like DNA , 2014, Nature.

[35]  F. Tsien,et al.  Chromosome preparation from cultured cells. , 2014, Journal of visualized experiments : JoVE.

[36]  Philip L. F. Johnson,et al.  The complete genome sequence of a Neandertal from the Altai Mountains , 2013, Nature.

[37]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[38]  Kai Rothkamm,et al.  Massively Parallel Sequencing Reveals the Complex Structure of an Irradiated Human Chromosome on a Mouse Background in the Tc1 Model of Down Syndrome , 2013, PloS one.

[39]  K. Lindblad-Toh,et al.  The genomic signature of dog domestication reveals adaptation to a starch-rich diet , 2013, Nature.

[40]  A. Barclay,et al.  How do pathogens drive the evolution of paired receptors? , 2013, European journal of immunology.

[41]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[42]  Adrian W. Briggs,et al.  A High-Coverage Genome Sequence from an Archaic Denisovan Individual , 2012, Science.

[43]  Richard S. Sandstrom,et al.  BEDOPS: high-performance genomic feature operations , 2012, Bioinform..

[44]  K. Boris-Lawrie,et al.  RNA helicases , 2010, RNA biology.

[45]  R. Bende,et al.  CD20 deficiency in humans results in impaired T cell-independent antibody responses. , 2010, The Journal of clinical investigation.

[46]  Joseph K. Pickrell,et al.  The Role of Geography in Human Adaptation , 2009, PLoS genetics.

[47]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[48]  Thomas Liehr,et al.  Fluorescence In Situ Hybridization (FISH) — Application Guide , 2009, Springer Berlin Heidelberg.

[49]  R. Weinshilboum,et al.  Sulfotransferase gene copy number variation: pharmacogenetics and function , 2009, Cytogenetic and Genome Research.

[50]  R. König,et al.  Global Analysis of Host-Pathogen Interactions that Regulate Early-Stage HIV-1 Replication , 2008, Cell.

[51]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[52]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[53]  T. Hayakawa,et al.  Discovery of Siglec‐14, a novel sialic acid receptor undergoing concerted evolution with Siglec‐5 in primates , 2006, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[54]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[55]  D. Swallow,et al.  The maltase-glucoamylase gene: Common ancestry to sucrase-isomaltase with complementary starch digestion activities , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[56]  Jonathan Scott Friedlaender,et al.  A Human Genome Diversity Cell Line Panel , 2002, Science.

[57]  M. Ramsay,et al.  In Southern Africa, brown oculocutaneous albinism (BOCA) maps to the OCA2 locus on chromosome 15q: P-gene mutations identified. , 2001, American journal of human genetics.

[58]  M. Ramsay,et al.  Oculocutaneous albinism (OCA2) in sub-Saharan Africa: distribution of the common 2.7-kb P gene deletion mutation , 1997, Human Genetics.

[59]  S. Hajduk,et al.  Killing of trypanosomes by the human haptoglobin-related protein. , 1995, Science.

[60]  M. Brilliant,et al.  African origin of an intragenic deletion of the human P gene in tyrosinase positive oculocutaneous albinism , 1994, Nature Genetics.

[61]  A. Hill,et al.  High frequencies of α-thalassaemia are the result of natural selection by malaria , 1986, Nature.

[62]  K. Summers,et al.  Extremely high frequencies of alpha-globin gene deletion in Madang and on Kar Kar Island, Papua New Guinea. , 1985, American journal of human genetics.