A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes

Defective Gene Detective Identifying genes that give rise to diseases is one of the major goals of sequencing human genomes. However, putative loss-of-function genes, which are often some of the first identified targets of genome and exome sequencing, have often turned out to be sequencing errors rather than true genetic variants. In order to identify the true scope of loss-of-function genes within the human genome, MacArthur et al. (p. 823; see the Perspective by Quintana-Murci) extensively validated the genomes from the 1000 Genomes Project, as well as an additional European individual, and found that the average person has about 100 true loss-of-function alleles of which approximately 20 have two copies within an individual. Because many known disease-causing genes were identified in “normal” individuals, the process of clinical sequencing needs to reassess how to identify likely causative alleles. Validation of predicted nonfunctional alleles in the human genome affects the medical interpretation of genomic analyses. Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease–causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.

[1]  Robert Blair Vocci Geology , 1882, Nature.

[2]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[3]  Paleoceanography. , 2021, Science.

[4]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[5]  A. Bittles,et al.  The costs of human inbreeding and their implications for variations at the DNA level , 1994, Nature Genetics.

[6]  X. Estivill,et al.  A novel donor splice site in intron 11 of the CFTR gene, created by mutation 1811+1.6kbA-->G, produces a new exon: high frequency in Spanish cystic fibrosis chromosomes and association with severe phenotype. , 1995, American journal of human genetics.

[7]  L. Maquat,et al.  Evidence that the decay of nucleus-associated nonsense mRNA for human triosephosphate isomerase involves nonsense codon recognition after splicing. , 1996, RNA.

[8]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[9]  Richard Mott,et al.  EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA , 1997, Comput. Appl. Biosci..

[10]  L. Maquat,et al.  A rule for termination-codon position within intron-containing genes: when nonsense affects RNA abundance. , 1998, Trends in biochemical sciences.

[11]  A. Novelletto,et al.  Two exon-skipping mutations as the molecular basis of succinic semialdehyde dehydrogenase deficiency (4-hydroxybutyric aciduria). , 1998, American journal of human genetics.

[12]  M V Olson,et al.  When less is more: gene loss as an engine of evolutionary change. , 1999, American journal of human genetics.

[13]  J. Ashby References and Notes , 1999 .

[14]  Justin C. Fay,et al.  Hitchhiking under positive Darwinian selection. , 2000, Genetics.

[15]  E L Sonnhammer,et al.  Integrated graphical analysis of protein sequence features predicted from sequence composition , 2001, Proteins.

[16]  A. Kondrashov,et al.  A Low Genomic Number of Recessive Lethals in Natural Populations of Bluefin Killifish and Zebrafish , 2002, Science.

[17]  D. Cooper,et al.  The mutational spectrum of single base-pair substitutions in mRNA splice junctions of human genes: Causes and consequences , 1992, Human Genetics.

[18]  Vivek Iyer,et al.  The otter annotation system. , 2004, Genome research.

[19]  K. Touhara,et al.  Structural determinants for membrane trafficking and G protein selectivity of a mouse olfactory receptor , 2004, Journal of neurochemistry.

[20]  D. Baralle,et al.  Splicing in action: assessing disease causing sequence changes , 2005, Journal of Medical Genetics.

[21]  宁北芳,et al.  疟原虫var基因转换速率变化导致抗原变异[英]/Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A , 2005 .

[22]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[23]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[24]  Igor Jurisica,et al.  Online Predicted Human Interaction Database , 2005, Bioinform..

[25]  Carlos Bustamante,et al.  Genomic scans for selective sweeps using SNP data. , 2005, Genome research.

[26]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[27]  Pardis Sabeti,et al.  Spread of an inactive form of caspase-12 in humans is due to recent positive selection. , 2006, American journal of human genetics.

[28]  J. Harrow,et al.  GENCODE: producing a reference annotation for ENCODE , 2006, Genome Biology.

[29]  D. Kelsell,et al.  Loss of desmoplakin isoform I causes early onset cardiomyopathy and heart failure in a Naxos-like syndrome , 2005, Journal of Medical Genetics.

[30]  Hitoshi Inada,et al.  Transient receptor potential family members PKD1L3 and PKD2L1 form a candidate sour taste receptor , 2006, Proceedings of the National Academy of Sciences.

[31]  Jayaram Chandrashekar,et al.  The cells and logic for mammalian sour taste detection , 2006, Nature.

[32]  J. Pritchard,et al.  A Map of Recent Positive Selection in the Human Genome , 2006, PLoS biology.

[33]  D. MacArthur,et al.  Loss of ACTN3 gene function alters mouse muscle metabolism and shows evidence of positive selection in humans , 2007, Nature Genetics.

[34]  D. Cooper,et al.  Gene conversion: mechanisms, evolution and human disease , 2007, Nature Reviews Genetics.

[35]  E. Marcotte,et al.  An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker's Yeast, Saccharomyces cerevisiae , 2007, PloS one.

[36]  Pardis C Sabeti,et al.  Genome-wide detection and characterization of positive selection in human populations , 2007, Nature.

[37]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[38]  A. Fraser,et al.  A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans , 2008, Nature Genetics.

[39]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[40]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[41]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[42]  E. Birney,et al.  Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. , 2008, Genome research.

[43]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[44]  Evan E Eichler,et al.  Haplotype sorting using human fosmid clone end-sequence pairs. , 2008, Genome research.

[45]  Timothy B. Stockwell,et al.  Genetic Variation in an Individual Human Exome , 2008, PLoS genetics.

[46]  P. Deloukas,et al.  A Genome-wide Survey of the Prevalence and Evolutionary Forces Acting on Human Nonsense Snps , 2022 .

[47]  J. Vivanco,et al.  ‡ To whom correspondence should be addressed: , 2022 .

[48]  Helen Schuilenburg,et al.  Genome-wide association study and meta-analysis finds over 40 loci affect risk of type 1 diabetes , 2009, Nature Genetics.

[49]  C. Mok,et al.  BBS7 and TTC8 (BBS8) mutations play a minor role in the mutational load of Bardet‐Biedl syndrome in a multiethnic population , 2009, Human mutation.

[50]  M. Gerstein,et al.  Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates , 2010, Genome Biology.

[51]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[52]  Jonathan M. Mudge,et al.  The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. , 2009, Genome research.

[53]  Pardis C Sabeti,et al.  Positive selection of a CD36 nonsense variant in sub-Saharan Africa, but no association with severe malaria phenotypes , 2009, Human molecular genetics.

[54]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[55]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[56]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[57]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[58]  James A. Morris,et al.  Evoker: a visualization tool for genotype intensity data , 2010, Bioinform..

[59]  D. MacArthur,et al.  Loss-of-function variants in the genomes of healthy humans. , 2010, Human molecular genetics.

[60]  R. Guigó,et al.  Transcriptome genetics using second generation sequencing in a Caucasian population , 2010, Nature.

[61]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2009 update , 2009, Nucleic Acids Res..

[62]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[63]  Elizabeth T. Cirulli,et al.  The Characterization of Twenty Sequenced Human Genomes , 2010, PLoS genetics.

[64]  Insuk Lee,et al.  Characterising and Predicting Haploinsufficiency in the Human Genome , 2010, PLoS genetics.

[65]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[66]  C. E. Pearson,et al.  Table S2: Trans-factors and trinucleotide repeat instability Trans-factor , 2010 .

[67]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[68]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[69]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[70]  P. Stankiewicz,et al.  Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. , 2010, The New England journal of medicine.

[71]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[72]  Mark Gerstein,et al.  Gene inactivation and its implications for annotation in the era of personal genomics. , 2011, Genes & development.

[73]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[74]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[75]  R. Durbin,et al.  Dindel: accurate indel calls from short-read data. , 2011, Genome research.

[76]  D. Cooper,et al.  Interlocus gene conversion events introduce deleterious mutations into at least 1% of human genes associated with inherited disease. , 2012, Genome research.