Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes

A central challenge in interpreting personal genomes is determining which mutations most likely influence disease. Although progress has been made in scoring the functional impact of individual mutations, the characteristics of the genes in which those mutations are found remain largely unexplored. For example, genes known to carry few common functional variants in healthy individuals may be judged more likely to cause certain kinds of disease than genes known to carry many such variants. Until now, however, it has not been possible to develop a quantitative assessment of how well genes tolerate functional genetic variation on a genome-wide scale. Here we describe an effort that uses sequence data from 6503 whole exome sequences made available by the NHLBI Exome Sequencing Project (ESP). Specifically, we develop an intolerance scoring system that assesses whether genes have relatively more or less functional genetic variation than expected based on the apparently neutral variation found in the gene. To illustrate the utility of this intolerance score, we show that genes responsible for Mendelian diseases are significantly more intolerant to functional genetic variation than genes that do not cause any known disease, but with striking variation in intolerance among genes causing different classes of genetic disease. We conclude by showing that use of an intolerance ranking system can aid in interpreting personal genomes and identifying pathogenic mutations.

[1]  Michael R. Johnson,et al.  De novo mutations in the classic epileptic encephalopathies , 2013, Nature.

[2]  D. Goldstein,et al.  Correction: Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes , 2013, PLoS Genetics.

[3]  M. Bucan,et al.  From Mouse to Human: Evolutionary Genomics Analysis of Human Orthologs of Essential Genes , 2013, PLoS genetics.

[4]  Jun Wang,et al.  gKaKs: the pipeline for genome-level Ka/Ks calculation , 2013, Bioinform..

[5]  B. V. van Bon,et al.  Diagnostic exome sequencing in persons with severe intellectual disability. , 2012, The New England journal of medicine.

[6]  D. Horn,et al.  Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study , 2012, The Lancet.

[7]  David B. Goldstein,et al.  De novo mutations in ATP1A3 cause alternating hemiplegia of childhood , 2012, Nature Genetics.

[8]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[9]  Jing Hu,et al.  SIFT web server: predicting effects of amino acid substitutions on proteins , 2012, Nucleic Acids Res..

[10]  Kenny Q. Ye,et al.  De Novo Gene Disruptions in Children on the Autistic Spectrum , 2012, Neuron.

[11]  Michael F. Walker,et al.  De novo mutations revealed by whole-exome sequencing are strongly associated with autism , 2012, Nature.

[12]  Evan T. Geller,et al.  Patterns and rates of exonic de novo mutations in autism spectrum disorders , 2012, Nature.

[13]  Bradley P. Coe,et al.  Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations , 2012, Nature.

[14]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): comprehensive resource for genetics and genomics of the laboratory mouse , 2011, Nucleic Acids Res..

[15]  J. Shendure,et al.  Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data , 2011, Nature Reviews Genetics.

[16]  Marek Kimmel,et al.  Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed , 2011, Human mutation.

[17]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[18]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[19]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[20]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[21]  Jonathan M. Mudge,et al.  The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. , 2009, Genome research.

[22]  William Lee,et al.  Analytical methods for inferring functional effects of single base pair substitutions in human cancers , 2009, Human Genetics.

[23]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[24]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.

[25]  J. Rashbass Online Mendelian Inheritance in Man. , 1995, Trends in genetics : TIG.

[26]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[27]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[28]  M. Nei,et al.  Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. , 1986, Molecular biology and evolution.

[29]  C. Luo,et al.  A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. , 1985, Molecular biology and evolution.

[30]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.