Haploinsufficiency predictions without study bias

Any given human individual carries multiple genetic variants that disrupt protein-coding genes, through structural variation, as well as nucleotide variants and indels. Predicting the phenotypic consequences of a gene disruption remains a significant challenge. Current approaches employ information from a range of biological networks to predict which human genes are haploinsufficient (meaning two copies are required for normal function) or essential (meaning at least one copy is required for viability). Using recently available study gene sets, we show that these approaches are strongly biased towards providing accurate predictions for well-studied genes. By contrast, we derive a haploinsufficiency score from a combination of unbiased large-scale high-throughput datasets, including gene co-expression and genetic variation in over 6000 human exomes. Our approach provides a haploinsufficiency prediction for over twice as many genes currently unassociated with papers listed in Pubmed as three commonly-used approaches, and outperforms these approaches for predicting haploinsufficiency for less-studied genes. We also show that fine-tuning the predictor on a set of well-studied ‘gold standard’ haploinsufficient genes does not improve the prediction for less-studied genes. This new score can readily be used to prioritize gene disruptions resulting from any genetic variant, including copy number variants, indels and single-nucleotide variants.

[1]  Michael Wigler,et al.  The role of de novo mutations in the genetics of autism spectrum disorders , 2014, Nature Reviews Genetics.

[2]  Insuk Lee,et al.  Characterising and Predicting Haploinsufficiency in the Human Genome , 2010, PLoS genetics.

[3]  Kenny Q. Ye,et al.  De Novo Gene Disruptions in Children on the Autistic Spectrum , 2012, Neuron.

[4]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[5]  Michael F. Walker,et al.  De novo mutations revealed by whole-exome sequencing are strongly associated with autism , 2012, Nature.

[6]  Mark Gerstein,et al.  Interpretation of Genomic Variants Using a Unified Biological Network Approach , 2013, PLoS Comput. Biol..

[7]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[8]  W. Wurst,et al.  Gene Knockout Protocols , 2009, Methods in Molecular Biology.

[9]  J. Seidman,et al.  Transcription factor haploinsufficiency: when half a loaf is not enough. , 2002, The Journal of clinical investigation.

[10]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[11]  C. Webber,et al.  The roles of FMRP-regulated genes in autism spectrum disorder: single- and multiple-hit genetic etiologies. , 2013, American journal of human genetics.

[12]  T. Doetschman Influence of genetic background on genetically engineered mouse phenotypes. , 2009, Methods in molecular biology.

[13]  M. Bucan,et al.  From Mouse to Human: Evolutionary Genomics Analysis of Human Orthologs of Essential Genes , 2013, PLoS genetics.

[14]  D. Goldstein,et al.  Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes , 2013, PLoS genetics.

[15]  Christopher S. Poultney,et al.  Synaptic, transcriptional, and chromatin genes disrupted in autism , 2014, Nature.

[16]  Caleb Webber,et al.  Phenotype Ontologies and Cross-Species Analysis for Translational Research , 2014, PLoS genetics.

[17]  C. Haldeman-Englert,et al.  Genes and biological processes commonly disrupted in rare and heterogeneous developmental delay syndromes. , 2011, Human molecular genetics.

[18]  Kengo Kinoshita,et al.  COXPRESdb: a database of comparative gene coexpression networks of eleven species for mammals , 2012, Nucleic Acids Res..

[19]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Sara Ballouz,et al.  Bias tradeoffs in the creation and analysis of protein-protein interaction networks. , 2014, Journal of proteomics.

[21]  Bridget E. Begg,et al.  A Proteome-Scale Map of the Human Interactome Network , 2014, Cell.

[22]  Evan T. Geller,et al.  Patterns and rates of exonic de novo mutations in autism spectrum disorders , 2012, Nature.

[23]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[24]  Bradley P. Coe,et al.  Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations , 2012, Nature.

[25]  Karin S Kassahn,et al.  Identification of human haploinsufficient genes and their genomic proximity to segmental duplications , 2008, European Journal of Human Genetics.