GenNet framework: interpretable neural networks for phenotype prediction

Deep learning is rarely used in population genomics because of the computational burden and challenges in interpreting neural networks. Here, we propose GenNet, a novel open-source deep learning framework for predicting phenotypes from genetic variants. In this framework, interpretable and memory-efficient neural network architectures are constructed by embedding biological knowledge from public databases, resulting in neural networks that contain only biological plausible connections. We applied the framework to seventeen phenotypes from a case-control study, a population-based study and the UK Biobank. Interpreting the networks revealed well-replicated genes such as HERC2 and OCA2 for hair and eye color and novel genes such as ZNF773 and PCNT for schizophrenia. Additionally, the framework obtained an AUC of 0.74 in the held-out test set and identified ubiquitin mediated proteolysis, endocrine system and viral infectious diseases as most predictive biological pathways for schizophrenia. GenNet is a freely available, end-to-end deep learning framework that allows researchers to develop and use interpretable neural networks to obtain novel insights into the genetic architecture of complex traits and diseases.

[1]  Giulio Genovese,et al.  Increased burden of ultra-rare protein-altering variants among 4,877 individuals with schizophrenia , 2016, Nature Neuroscience.

[2]  P. Visscher,et al.  Meta-analysis of genome-wide association studies for height and body mass index in ∼700,000 individuals of European ancestry , 2018, bioRxiv.

[3]  J. Michael Cherry,et al.  The Encyclopedia of DNA elements (ENCODE): data portal update , 2017, Nucleic Acids Res..

[4]  S. Phinn,et al.  Australian vegetated coastal ecosystems as global hotspots for climate change mitigation , 2019, Nature Communications.

[5]  Manuel A. R. Ferreira,et al.  Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. , 2009, American journal of human genetics.

[6]  Johan T den Dunnen,et al.  Three genome-wide association studies and a linkage analysis identify HERC2 as a human iris color gene. , 2008, American journal of human genetics.

[7]  Jakob Grove,et al.  Genome-wide association study identifies 30 Loci Associated with Bipolar Disorder , 2017, bioRxiv.

[8]  P. Visscher,et al.  Meta-analysis of genome-wide association studies for height and body mass index in ∼700,000 individuals of European ancestry , 2018, bioRxiv.

[9]  F. Hu,et al.  A Genome-Wide Association Study Identifies Novel Alleles Associated with Hair Color and Skin Pigmentation , 2008, PLoS genetics.

[10]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[11]  Prashant S. Emani,et al.  Comprehensive functional genomic resource and integrative model for the human brain , 2018, Science.

[12]  D. Porteous,et al.  DISC1-binding proteins in neural development, signalling and schizophrenia , 2012, Neuropharmacology.

[13]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[14]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[15]  Alkes L. Price,et al.  Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types , 2017 .

[16]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[17]  M. P. Concas,et al.  Genome-wide association meta-analysis of individuals of European ancestry identifies new loci explaining a substantial fraction of hair color variation and heritability , 2018, Nature Genetics.

[18]  M. I. V. Eale,et al.  SLAVE TO THE ALGORITHM ? WHY A ‘ RIGHT TO AN EXPLANATION ’ IS PROBABLY NOT THE REMEDY YOU ARE LOOKING FOR , 2017 .

[19]  Nathan E. Lewis,et al.  From Genotype to Phenotype: Augmenting Deep Learning with Networks and Systems Biology. , 2019, Current opinion in systems biology.

[20]  D. Absher,et al.  Genome-Wide Association Studies of Quantitatively Measured Skin, Hair, and Eye Pigmentation in Four European Populations , 2012, PloS one.

[21]  A. Cecile J.W. Janssens,et al.  Eye color and the prediction of complex phenotypes from genotypes , 2009, Current Biology.

[22]  Eric S. Lander,et al.  A polygenic burden of rare disruptive mutations in schizophrenia , 2014, Nature.

[23]  P. Visscher,et al.  The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling , 2010, PLoS genetics.

[24]  Robert-Jan Palstra,et al.  HERC2 rs12913832 modulates human pigmentation by attenuating chromatin-loop formation between a long-range enhancer and the OCA2 promoter. , 2012, Genome research.

[25]  Filip Karlo Dosilovic,et al.  Explainable artificial intelligence: A survey , 2018, 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[26]  C. Spencer,et al.  Biological Insights From 108 Schizophrenia-Associated Genetic Loci , 2014, Nature.

[27]  S. Fatemi,et al.  The neurodevelopmental hypothesis of schizophrenia, revisited. , 2009, Schizophrenia bulletin.

[28]  Daguang Xu,et al.  Privacy-preserving Federated Brain Tumour Segmentation , 2019, MLMI@MICCAI.

[29]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[30]  Chandrasekhar Kanduri,et al.  GeneSCF: a real-time based functional enrichment tool with support for multiple organisms , 2016, BMC Bioinformatics.

[31]  P. Donnelly,et al.  The UK Biobank resource with deep phenotyping and genomic data , 2018, Nature.

[32]  Brenda C T Kieboom,et al.  Objectives, design and main findings until 2020 from the Rotterdam Study , 2020, European Journal of Epidemiology.

[33]  Roded Sharan,et al.  Using deep learning to model the hierarchical structure and function of a cell , 2018, Nature Methods.

[34]  Evan Z. Macosko,et al.  Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types , 2017, Nature Genetics.

[35]  Michael Veale,et al.  Slave to the Algorithm? Why a 'Right to an Explanation' Is Probably Not the Remedy You Are Looking For , 2017 .

[36]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[37]  Joris M. Mooij,et al.  MAGMA: Generalized Gene-Set Analysis of GWAS Data , 2015, PLoS Comput. Biol..

[38]  A. Chen-Plotkin,et al.  The Post-GWAS Era: From Association to Function. , 2018, American journal of human genetics.

[39]  Gonçalo Abecasis,et al.  Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank , 2019, bioRxiv.

[40]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[41]  Danielle Posthuma,et al.  Genetic mapping of cell type specificity for complex traits , 2019, Nature Communications.

[42]  Benjamin J. Raphael,et al.  Visible Machine Learning for Biomedicine , 2018, Cell.

[43]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[44]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[45]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[46]  W. J. Niessen,et al.  HASE: Framework for efficient high-dimensional association analyses , 2016, Scientific Reports.

[47]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..