Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes. Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved human mutation rate model, we classify human protein-coding genes along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.

[1]  A. Sands,et al.  Knockouts model the 100 best-selling drugs—will they model the next 100? , 2003, Nature Reviews Drug Discovery.

[2]  Jonathan C. Cohen,et al.  Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. , 2006, The New England journal of medicine.

[3]  E. Perez-stable,et al.  Division of General Internal Medicine , 2009 .

[4]  Avner Schlessinger,et al.  ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI , 2012 .

[5]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[6]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[7]  Evan T. Geller,et al.  Patterns and rates of exonic de novo mutations in autism spectrum disorders , 2012, Nature.

[8]  D. Goldstein,et al.  Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes , 2013, PLoS genetics.

[9]  Stephan J Sanders,et al.  A framework for the interpretation of de novo mutation in human disease , 2014, Nature Genetics.

[10]  Korbinian Schneeberger,et al.  Using next-generation sequencing to isolate mutant genes from forward genetic screens , 2014, Nature Reviews Genetics.

[11]  E. Banks,et al.  De novo mutations in schizophrenia implicate synaptic networks , 2014, Nature.

[12]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[13]  J. E. Richardson,et al.  MouseMine: a new data warehouse for MGI , 2015, Mammalian Genome.

[14]  Karynne E. Patterson,et al.  The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities. , 2015, American journal of human genetics.

[15]  Ayal B. Gussow,et al.  The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity , 2015, PLoS genetics.

[16]  A. Clark,et al.  Estimating the mutation load in human genomes , 2015, Nature Reviews Genetics.

[17]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease , 2014, Nucleic Acids Res..

[18]  S. Gravel When Is Selection Effective? , 2014, Genetics.

[19]  Patrick F. Sullivan,et al.  Ultra-rare disruptive and damaging mutations influence educational attainment in the general population , 2016, Nature Neuroscience.

[20]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[21]  Giulio Genovese,et al.  Increased burden of ultra-rare protein-altering variants among 4,877 individuals with schizophrenia , 2016, Nature Neuroscience.

[22]  Nicola J. Rinaldi,et al.  Genetic effects on gene expression across human tissues , 2017, Nature.

[23]  Stephan J Sanders,et al.  Refining the role of de novo protein truncating variants in neurodevelopmental disorders using population reference samples , 2016, Nature Genetics.

[24]  D. Durocher,et al.  Evaluation and Design of Genome-Wide CRISPR/SpCas9 Knockout Screens , 2017, G3: Genes, Genomes, Genetics.

[25]  K. Eilbeck,et al.  Settling the score: variant prioritization and Mendelian disease , 2017, Nature Reviews Genetics.

[26]  Elvira Bramon,et al.  The contribution of rare variants to risk of schizophrenia in individuals with and without intellectual disability , 2017, Nature Genetics.

[27]  Keith Lawson,et al.  Evaluation and Design of Genome-wide CRISPR/Cas9 Knockout Screens , 2017, bioRxiv.

[28]  David P. Nusinow,et al.  Estimating the Selective Effects of Heterozygous Protein Truncating Variants from Human Exome Data , 2017, Nature Genetics.

[29]  Bryce K. Allen,et al.  Comprehensive Analysis of Tissue-wide Gene Expression and Phenotype Data Reveals Tissues Affected in Rare Genetic Disorders. , 2018, Cell systems.

[30]  Hannes P. Eggertsson,et al.  Parental influence on human germline de novo mutations in 1,548 trios from Iceland , 2017, Nature.

[31]  M. Weedon,et al.  Analysis of large‐scale sequencing cohorts does not support the role of variants in UCP2 as a cause of hyperinsulinaemic hypoglycaemia , 2017, Human mutation.

[32]  M. Rivas,et al.  Medical relevance of protein-truncating variants across 337,205 individuals in the UK Biobank study , 2018, Nature Communications.

[33]  Alicia R. Martin,et al.  Hidden ‘risk’ in polygenic scores: clinical use today could exacerbate health disparities , 2018, bioRxiv.

[34]  M. Daly,et al.  ASD and ADHD have a similar burden of rare protein-truncating variants , 2018, bioRxiv.

[35]  Joseph D. Janizek,et al.  Accurate classification of BRCA1 variants with saturation genome editing , 2018, Nature.

[36]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[37]  J. Pritchard,et al.  Evidence for Weak Selective Constraint on Human Gene Expression , 2018, Genetics.

[38]  Benjamin Neale,et al.  A synthetic-diploid benchmark for accurate variant calling evaluation , 2018, Nature Methods.

[39]  Alicia R. Martin,et al.  Current clinical use of polygenic scores will risk exacerbating health disparities , 2018 .

[40]  M. Pirinen,et al.  Contribution of rare and common variants to intellectual disability in a high-risk population sub-isolate of Northern Finland , 2018, bioRxiv.

[41]  Caroline F. Wright,et al.  De novo mutations in regulatory elements in neurodevelopmental disorders , 2018, Nature.

[42]  Alicia R. Martin,et al.  Quantifying the Impact of Rare and Ultra-rare Coding Variation across the Phenotypic Spectrum. , 2018, American journal of human genetics.

[43]  Alex Diaz-Papkovich,et al.  Revealing multi-scale population structure in large cohorts , 2018, bioRxiv.

[44]  Mary E. Haas,et al.  Analysis of predicted loss-of-function variants in UK Biobank identifies variants protective for disease , 2018, Nature Communications.

[45]  G. Sella,et al.  Measuring intolerance to mutation in human genetics , 2018, Nature Genetics.

[46]  Jörg Hakenberg,et al.  Predicting the clinical impact of human mutation with deep neural networks , 2018, Nature Genetics.

[47]  M. Gerstein,et al.  Insights into genetics, human biology and disease gleaned from family based genomic studies , 2019, Genetics in Medicine.

[48]  Jonathan M. Mudge,et al.  Transcript expression-aware annotation improves rare variant discovery and interpretation , 2019, bioRxiv.

[49]  D. Stainier,et al.  Genetic compensation triggered by mutant mRNA degradation , 2019, Nature.

[50]  Beryl B. Cummings,et al.  Human loss-of-function variants suggest that partial LRRK2 inhibition is a safe therapeutic strategy for Parkinson’s disease , 2019, bioRxiv.

[51]  Beryl B. Cummings,et al.  Evaluating potential drug targets through human loss-of-function genetic variation , 2019, bioRxiv.

[52]  P. Hainaut,et al.  Variable population prevalence estimates of germline TP53 variants: A gnomAD‐based analysis , 2018, Human mutation.

[53]  Grace Tiao,et al.  An open resource of structural variation for medical and population genetics , 2019 .

[54]  D. Mathews,et al.  CRISPR-Cas9-based mutagenesis frequently provokes on-target mRNA misregulation , 2019, Nature Communications.

[55]  Beryl B. Cummings,et al.  Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes , 2019, bioRxiv.

[56]  Tariq Ahmad,et al.  Characterising the loss-of-function impact of 5’ untranslated region variants in 15,708 individuals , 2019, bioRxiv.

[57]  Michael J Bamshad,et al.  Mendelian Gene Discovery: Fast and Furious with No End in Sight. , 2019, American journal of human genetics.

[58]  M. Hall National Heart, Lung, and Blood Institute , 2020, The Grants Register 2022.