A refined characterization of large-scale genomic differences in the first complete human genome

The first telomere-to-telomere (T2T) human genome assembly (T2T-CHM13) release was a milestone in human genomics. The T2T-CHM13 genome assembly extends our understanding of telomeres, centromeres, segmental duplication, and other complex regions. The current human genome reference (GRCh38) has been widely used in various human genomic studies. However, the large-scale genomic differences between these two important genome assemblies are not characterized in detail yet. Here, we identify 590 discrepant regions (∼226 Mbp) in total. In addition to the previously reported ‘non-syntenic’ regions, we identify 67 additional large-scale discrepant regions and precisely categorize them into four structural types with a newly developed website tool (SynPlotter). The discrepant regions (∼20.4 Mbp) excluding telomeric and centromeric regions are highly structurally polymorphic in humans, where copy number variation are likely associated with various human disease and disease susceptibility, such as immune and neurodevelopmental disorders. The analyses of a newly identified discrepant region—the KLRC gene cluster—shows that the depletion of KLRC2 by a single deletion event is associated with natural killer cell differentiation in ∼20% of humans. Meanwhile, the rapid amino acid replacements within KLRC3 is consistent with the action of natural selection during primate evolution. Our study furthers our understanding of the large-scale structural variation differences between these two crucial human reference genomes and future interpretation of studies of human genetic variation.

[1]  Glennis A. Logsdon,et al.  Neurodevelopmental copy-number variants: A roadmap to improving outcomes by uniting patient advocates, researchers, and clinicians for collective impact. , 2022, American journal of human genetics.

[2]  William T. Harvey,et al.  Gaps and complex structurally variant loci in phased genome assemblies , 2022, bioRxiv.

[3]  Yafei Mao,et al.  A complete, telomere-to-telomere human genome sequence presents new opportunities for evolutionary genomics , 2022, Nature Methods.

[4]  C. Shatz,et al.  The nonclassical MHC class I Qa-1 expressed in layer 6 neurons regulates activity-dependent plasticity via microglial CD94/NKG2 in the cortex , 2022, Proceedings of the National Academy of Sciences of the United States of America.

[5]  William T. Harvey,et al.  Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders , 2022, Cell.

[6]  Joshua F. McMichael,et al.  The Human Pangenome Project: a global resource to map genomic diversity , 2022, Nature.

[7]  Ira M. Hall,et al.  Semi-automated assembly of high-quality diploid human reference genomes , 2022, bioRxiv.

[8]  William T. Harvey,et al.  Familial long-read sequencing increases yield of de novo mutations. , 2022, American journal of human genetics.

[9]  L. Criswell,et al.  P2RY8 variants in lupus patients uncover a role for the receptor in immunological tolerance , 2021, The Journal of experimental medicine.

[10]  Hannes P. Eggertsson,et al.  The sequences of 150,119 genomes in the UK Biobank , 2021, Nature.

[11]  Ryan L. Collins,et al.  A cross-disorder dosage sensitivity map of the human genome , 2021, Cell.

[12]  B. McNicholas,et al.  Unraveling Structural Rearrangements of the CFH Gene Cluster in Atypical Hemolytic Uremic Syndrome Patients Using Molecular Combing and Long-Fragment Targeted Sequencing , 2022 .

[13]  H. Snieder,et al.  Genome-wide CNV investigation suggests a role for cadherin, Wnt, and p53 pathways in primary open-angle glaucoma , 2021, BMC Genomics.

[14]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[15]  P. Ren,et al.  CT45A1 promotes the metastasis of osteosarcoma cells in vitro and in vivo through β-catenin , 2021, Cell Death & Disease.

[16]  William T. Harvey,et al.  A high-quality bonobo genome refines the analysis of hominid evolution , 2021, Nature.

[17]  William T. Harvey,et al.  Haplotype-resolved diverse human genomes and integrated analysis of structural variation , 2021, Science.

[18]  E. Puchhammer-Stöckl,et al.  Deletion of the NKG2C receptor encoding KLRC2 gene and HLA-E variants are risk factors for severe COVID-19 , 2021, Genetics in Medicine.

[19]  David Haussler,et al.  The UCSC Genome Browser database: 2021 update , 2020, Nucleic Acids Res..

[20]  Kiyoshi Asai,et al.  PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores , 2020, Bioinform..

[21]  O. Hansson,et al.  Alpha-amylase 1A copy number variants and the association with memory performance and Alzheimer’s dementia , 2020, Alzheimer's Research & Therapy.

[22]  Shan-Shan Dong,et al.  LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on variant call format files , 2020, bioRxiv.

[23]  Evan E. Eichler,et al.  Long-read human genome sequencing and its applications , 2020, Nature Reviews Genetics.

[24]  L. Jorde,et al.  The Simons Genome Diversity Project: A Global Analysis of Mobile Element Diversity , 2020, Genome biology and evolution.

[25]  Tariq Ahmad,et al.  A structural variation reference for medical and population genetics , 2020, Nature.

[26]  D. Fardo,et al.  The MUC6/AP2A2 Locus and Its Relevance to Alzheimer’s Disease: A Review , 2020, Journal of neuropathology and experimental neurology.

[27]  Shondra M. Pruett-Miller,et al.  A Cancer-Specific Ubiquitin Ligase Drives mRNA Alternative Polyadenylation by Ubiquitinating the mRNA 3' End Processing Complex. , 2020, Molecular cell.

[28]  Fumiaki Tanaka,et al.  Long-read sequencing identifies GGC repeat expansions in NOTCH2NLC associated with neuronal intranuclear inclusion disease , 2019, Nature Genetics.

[29]  Shinichi Morishita,et al.  Noncoding CGG repeat expansions in neuronal intranuclear inclusion disease, oculopharyngodistal myopathy and an overlapping disease , 2019, Nature Genetics.

[30]  Evan E Eichler,et al.  Genetic Variation, Comparative Genomics, and the Diagnosis of Disease. , 2019, The New England journal of medicine.

[31]  S. Scherer,et al.  A Third Linear Association Between Olduvai (DUF1220) Copy Number and Severity of the Classic Symptoms of Inherited Autism. , 2019, The American journal of psychiatry.

[32]  O. Andreassen,et al.  A global overview of pleiotropy and genetic architecture in complex traits , 2019, Nature Genetics.

[33]  Jing Zhao,et al.  The influence of polymorphic GSTM1 gene on the increased susceptibility of non-viral hepatic cirrhosis: evidence from observational studies , 2018, European Journal of Medical Research.

[34]  David Haussler,et al.  Human-Specific NOTCH2NL Genes Affect Notch Signaling and Cortical Neurogenesis , 2018, Cell.

[35]  David Gacquer,et al.  Human-Specific NOTCH2NL Genes Expand Cortical Neurogenesis through Delta/Notch Regulation , 2018, Cell.

[36]  Adam M. Phillippy,et al.  MUMmer4: A fast and versatile genome alignment system , 2018, PLoS Comput. Biol..

[37]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[38]  Bernat Gel,et al.  karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data , 2017, bioRxiv.

[39]  Mahshid S. Azamian,et al.  Xp11.22 deletions encompassing CENPVL1, CENPVL2, MAGED1 and GSPT2 as a cause of syndromic X-linked intellectual disability , 2017, PloS one.

[40]  D. Jong,et al.  ZDHHC11 and ZDHHC11B are critical novel components of the oncogenic MYC-miR-150-MYB network in Burkitt lymphoma , 2017, Leukemia.

[41]  R. Nussbaum,et al.  Evaluation of copy-number variants as modifiers of breast and ovarian cancer risk for BRCA1 pathogenic variant carriers , 2017, European Journal of Human Genetics.

[42]  C. Baker,et al.  The evolution and population diversity of human-specific segmental duplications. , 2017, Nature ecology & evolution.

[43]  V. Calhoun,et al.  A pilot study on commonality and specificity of copy number variants in schizophrenia and bipolar disorder , 2016, Translational Psychiatry.

[44]  H. Hakonarson,et al.  Copy number variation analysis reveals additional variants contributing to endometriosis development , 2016, Journal of Assisted Reproduction and Genetics.

[45]  Mario Roederer,et al.  The Genetic Architecture of the Human Immune System: A Bioresource for Autoimmunity and Disease Pathogenesis , 2015, Cell.

[46]  Sergei L. Kosakovsky Pond,et al.  UC Office of the President Recent Work Title Less Is More : An Adaptive Branch-Site Random Effects Model for Efficient Detection of Episodic Diversifying Selection Permalink , 2015 .

[47]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[48]  Anders Larsson,et al.  AliView: a fast and lightweight alignment viewer and editor for large datasets , 2014, Bioinform..

[49]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[50]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[51]  F. Pasquier,et al.  Alzheimer risk associated with a copy number variation in the complement receptor 1 increasing C3b/C4b binding sites , 2011, Molecular Psychiatry.

[52]  Gregory M. Cooper,et al.  A Copy Number Variation Morbidity Map of Developmental Delay , 2011, Nature Genetics.

[53]  Peter H. Sudmant,et al.  Diversity of Human Copy Number Variation and Multicopy Genes , 2010, Science.

[54]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[55]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[56]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[57]  E. Eichler,et al.  DupMasker: a tool for annotating primate segmental duplications. , 2008, Genome research.

[58]  R. Strong,et al.  Structural basis for NKG2A/CD94 recognition of HLA-E , 2008, Proceedings of the National Academy of Sciences.

[59]  T. Beddoe,et al.  CD94-NKG2A recognition of human leukocyte antigen (HLA)-E bound to an HLA class I leader sequence , 2008, The Journal of experimental medicine.

[60]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[61]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[62]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[63]  Jonathan Scott Friedlaender,et al.  A Human Genome Diversity Cell Line Panel , 2002, Science.

[64]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..