The human gene damage index as a gene-level approach to prioritizing exome variants

Significance The protein-coding exome of a patient with a monogenic disease contains about 20,000 variations, of which only one or two are disease causing. When attempting to select disease-causing candidate mutation(s), a challenge is to filter out as many false-positive (FP) variants as possible. In this study, we describe the gene damage index (GDI), a metric for the nonsynonymous mutational load in each protein-coding gene in the general population. We show that the GDI is an efficient gene-level method for filtering out FP variants in genes that are highly damaged in the general population. The protein-coding exome of a patient with a monogenic disease contains about 20,000 variants, only one or two of which are disease causing. We found that 58% of rare variants in the protein-coding exome of the general population are located in only 2% of the genes. Prompted by this observation, we aimed to develop a gene-level approach for predicting whether a given human protein-coding gene is likely to harbor disease-causing mutations. To this end, we derived the gene damage index (GDI): a genome-wide, gene-level metric of the mutational damage that has accumulated in the general population. We found that the GDI was correlated with selective evolutionary pressure, protein complexity, coding sequence length, and the number of paralogs. We compared GDI with the leading gene-level approaches, genic intolerance, and de novo excess, and demonstrated that GDI performed best for the detection of false positives (i.e., removing exome variants in genes irrelevant to disease), whereas genic intolerance and de novo excess performed better for the detection of true positives (i.e., assessing de novo mutations in genes likely to be disease causing). The GDI server, data, and software are freely available to noncommercial users from lab.rockefeller.edu/casanova/GDI.

[1]  Yuval Itan,et al.  Can the impact of human genetic variations be predicted? , 2015, Proceedings of the National Academy of Sciences.

[2]  T. Andrews,et al.  Comparison of predicted and actual consequences of missense mutations , 2015, Proceedings of the National Academy of Sciences.

[3]  D. Goldstein,et al.  The genetics of neuropsychiatric diseases: looking in and beyond the exome. , 2015, Annual review of neuroscience.

[4]  Matthew W. Snyder,et al.  Haplotype-resolved genome sequencing: experimental methods and applications , 2015, Nature Reviews Genetics.

[5]  Alan Hodgkinson,et al.  Recombination affects accumulation of damaging and disease-associated mutations in human populations , 2015, Nature Genetics.

[6]  J. Casanova,et al.  Novel Primary Immunodeficiency Candidate Genes Predicted by the Human Gene Connectome , 2015, Front. Immunol..

[7]  Yujun Han,et al.  Whole-exome sequencing in undiagnosed genetic diseases: interpreting 119 trios , 2015, Genetics in Medicine.

[8]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[9]  Lei Shang,et al.  Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants , 2014, Proceedings of the National Academy of Sciences.

[10]  J. Casanova,et al.  Guidelines for genetic studies in single patients: lessons from primary immunodeficiencies , 2014, The Journal of experimental medicine.

[11]  J. Casanova,et al.  Discovery of single-gene inborn errors of immunity by next generation sequencing. , 2014, Current opinion in immunology.

[12]  Stephan J Sanders,et al.  A framework for the interpretation of de novo mutation in human disease , 2014, Nature Genetics.

[13]  Stephen B. Montgomery,et al.  Transcriptome Sequencing from Diverse Human Populations Reveals Differentiated Regulatory Architecture , 2014, PLoS genetics.

[14]  A. Need,et al.  One gene, many neuropsychiatric disorders: lessons from Mendelian diseases , 2014, Nature Neuroscience.

[15]  J. Casanova,et al.  Primary Immunodeficiency Diseases: an Update on the Classification from the International Union of Immunological Societies Expert Committee for Primary Immunodeficiency 2015 , 2015, Journal of Clinical Immunology.

[16]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[17]  L. Quintana-Murci,et al.  A genomic portrait of the genetic architecture and regulatory impact of microRNA expression in response to infection , 2014, Genome research.

[18]  Lluis Quintana-Murci,et al.  HGCS: an online tool for prioritizing disease-causing gene variants by biological distance , 2014, BMC Genomics.

[19]  T. Fleisher Ribosomal Protein SA Haploinsufficiency in Humans With Isolated Congenital Asplenia , 2013, Pediatrics.

[20]  P. Stenson,et al.  The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine , 2013, Human Genetics.

[21]  D. Goldstein,et al.  Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes , 2013, PLoS genetics.

[22]  D. Goldstein,et al.  Sequencing studies in human genetics: design and interpretation , 2013, Nature Reviews Genetics.

[23]  Guillaume Vogt,et al.  The human gene connectome as a map of short cuts for morbid allele discovery , 2013, Proceedings of the National Academy of Sciences.

[24]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[25]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[26]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[27]  Christian Gilissen,et al.  Disease gene identification strategies for exome sequencing , 2012, European Journal of Human Genetics.

[28]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[29]  J. Casanova,et al.  Evolutionary genetic dissection of human interferons , 2011, The Journal of experimental medicine.

[30]  M. Esteller Non-coding RNAs in human disease , 2011, Nature Reviews Genetics.

[31]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[32]  A. Dunker,et al.  Evolution and disorder. , 2011, Current opinion in structural biology.

[33]  Steven M. Holland,et al.  Primary Immunodeficiency Diseases: an Update on the Classification from the International Union of Immunological Societies Expert Committee for Primary Immunodeficiency 2015 , 2009, Front. Immun..

[34]  Adam Eyre-Walker,et al.  Estimation of the neutrality index. , 2011, Molecular biology and evolution.

[35]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[36]  Kevin Bryson,et al.  Detecting Gene Duplications in the Human Lineage , 2010, Annals of human genetics.

[37]  Raju Tomer,et al.  Profiling by Image Registration Reveals Common Origin of Annelid Mushroom Bodies and Vertebrate Pallium , 2010, Cell.

[38]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[39]  Pedro M. Valero-Mora,et al.  ggplot2: Elegant Graphics for Data Analysis , 2010 .

[40]  J. Fak,et al.  Chaolin Zhang and Its Combinatorial Controls Integrative Modeling Defines the Nova Splicing-Regulatory Network , 2013 .

[41]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[42]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[43]  P. Stankiewicz,et al.  Structural variation in the human genome and its role in disease. , 2010, Annual review of medicine.

[44]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[45]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[46]  Junjun Zhang,et al.  BioMart Central Portal—unified access to biological data , 2009, Nucleic Acids Res..

[47]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[48]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[49]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[50]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008 .

[51]  Travis E. Oliphant,et al.  Python for Scientific Computing , 2007, Computing in Science & Engineering.

[52]  István Simon,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm035 Structural bioinformatics Local structural disorder imparts plasticity on linear motifs , 2022 .

[53]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[54]  E. Koonin Orthologs, Paralogs, and Evolutionary Genomics 1 , 2005 .

[55]  Gustavo Glusman,et al.  A comparison of the human and chimpanzee olfactory receptor gene repertoires. , 2005, Genome research.

[56]  E. Koonin Orthologs, paralogs, and evolutionary genomics. , 2005, Annual review of genetics.

[57]  Gil Ast,et al.  How did alternative splicing evolve? , 2004, Nature Reviews Genetics.

[58]  Eric Smith,et al.  Universality in intermediary metabolism. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[59]  Doron Lancet,et al.  Population differences in the human functional olfactory repertoire. , 2003, Molecular biology and evolution.

[60]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[61]  A. Ciechanover,et al.  The ubiquitin-proteasome proteolytic pathway: destruction for the sake of construction. , 2002, Physiological reviews.

[62]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[63]  Chris Lloyd,et al.  Kernel estimators of the ROC curve are better than empirical , 1999 .

[64]  L. Kann,et al.  Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes from Drosophila, mice, and humans. , 1996, Molecular biology and evolution.

[65]  Teri A. Crosby,et al.  How to Detect and Handle Outliers , 1993 .

[66]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[67]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.