Characterising and Predicting Haploinsufficiency in the Human Genome

Haploinsufficiency, wherein a single functional copy of a gene is insufficient to maintain normal function, is a major cause of dominant disease. Human disease studies have identified several hundred haploinsufficient (HI) genes. We have compiled a map of 1,079 haplosufficient (HS) genes by systematic identification of genes unambiguously and repeatedly compromised by copy number variation among 8,458 apparently healthy individuals and contrasted the genomic, evolutionary, functional, and network properties between these HS genes and known HI genes. We found that HI genes are typically longer and have more conserved coding sequences and promoters than HS genes. HI genes exhibit higher levels of expression during early development and greater tissue specificity. Moreover, within a probabilistic human functional interaction network HI genes have more interaction partners and greater network proximity to other known HI genes. We built a predictive model on the basis of these differences and annotated 12,443 genes with their predicted probability of being haploinsufficient. We validated these predictions of haploinsufficiency by demonstrating that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice. We have transformed these gene-based haploinsufficiency predictions into haploinsufficiency scores for genic deletions, which we demonstrate to better discriminate between pathogenic and benign deletions than consideration of the deletion size or numbers of genes deleted. These robust predictions of haploinsufficiency support clinical interpretation of novel loss-of-function variants and prioritization of variants and genes for follow-up studies.

[1]  R. Veitia,et al.  Exploring the Molecular Etiology of Dominant-Negative Mutations[W] , 2007, The Plant Cell Online.

[2]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[3]  Joshua M. Korn,et al.  Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs , 2008, Nature Genetics.

[4]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[5]  Judith A. Blake,et al.  The Mouse Genome Database genotypes::phenotypes , 2008, Nucleic Acids Res..

[6]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[7]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[8]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[9]  Timothy B. Stockwell,et al.  Genetic Variation in an Individual Human Exome , 2008, PLoS genetics.

[10]  A. Fraser,et al.  A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans , 2008, Nature Genetics.

[11]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[12]  Ronald W. Davis,et al.  Mechanisms of Haploinsufficiency Revealed by Genome-Wide Profiling in Yeast , 2005, Genetics.

[13]  E. Marcotte,et al.  An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker's Yeast, Saccharomyces cerevisiae , 2007, PloS one.

[14]  Ryan D. Hernandez,et al.  Proportionally more deleterious genetic variation in European than in African populations , 2008, Nature.

[15]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[16]  Hongyu Zhao,et al.  Rare independent mutations in renal salt handling genes contribute to blood pressure variation , 2008, Nature Genetics.

[17]  Kosuke M. Teshima,et al.  Natural Selection on Genes that Underlie Human Disease Susceptibility , 2008, Current Biology.

[18]  Janan T. Eppig,et al.  The Mouse Gene Expression Database (GXD) , 2001, Nucleic Acids Res..

[19]  Eugene V Koonin,et al.  A common framework for understanding the origin of genetic dominance and evolutionary fates of gene duplications. , 2004, Trends in genetics : TIG.

[20]  M. Stratton,et al.  COSMIC 2005 , 2006, British Journal of Cancer.

[21]  P. Visscher,et al.  Rare chromosomal deletions and duplications increase risk of schizophrenia , 2008, Nature.

[22]  J. Seidman,et al.  Transcription factor haploinsufficiency: when half a loaf is not enough. , 2002, The Journal of clinical investigation.

[23]  Janan T. Eppig,et al.  The mouse Gene Expression Database (GXD): 2017 update , 2016, Nucleic Acids Res..

[24]  V. Pantesco,et al.  A Meta‐Analysis of Human Embryonic Stem Cells Transcriptome Integrated into a Web‐Based Expression Atlas , 2007, Stem cells.

[25]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[26]  Ryan D. Hernandez,et al.  Assessing the Evolutionary Impact of Amino Acid Mutations in the Human Genome , 2008, PLoS genetics.

[27]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[28]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..

[29]  A. Wilkie,et al.  The molecular basis of genetic dominance. , 1994, Journal of medical genetics.

[30]  A Rogier T Donders,et al.  Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. , 2006, Journal of clinical epidemiology.

[31]  Igor Jurisica,et al.  Online Predicted Human Interaction Database , 2005, Bioinform..

[32]  Karin S Kassahn,et al.  Identification of human haploinsufficient genes and their genomic proximity to segmental duplications , 2008, European Journal of Human Genetics.

[33]  Andrew M. Jenkinson,et al.  Ensembl 2009 , 2008, Nucleic Acids Res..

[34]  Charles Lee,et al.  Copy number variations and clinical cytogenetic diagnosis of constitutional disorders , 2007, Nature Genetics.

[35]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[37]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[38]  S. Henikoff,et al.  Predicting the effects of amino acid substitutions on protein function. , 2006, Annual review of genomics and human genetics.

[39]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[40]  Deborah L. Levy,et al.  A recurrent 16p12.1 microdeletion suggests a two-hit model for severe developmental delay , 2010, Nature Genetics.

[41]  Pardis Sabeti,et al.  Spread of an inactive form of caspase-12 in humans is due to recent positive selection. , 2006, American journal of human genetics.

[42]  Caleb Webber,et al.  Bias of Selection on Human Copy-Number Variants , 2006, PLoS genetics.

[43]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[44]  Ryan D. Hernandez,et al.  Natural selection on protein-coding genes in the human genome , 2005, Nature.

[45]  Manuel Corpas,et al.  DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. , 2009, American journal of human genetics.