Towards Probabilistic Generative Models Harnessing Graph Neural Networks for Disease-Gene Prediction

Disease-gene prediction (DGP) refers to the computational challenge of predicting associations between genes and diseases. Effective solutions to the DGP problem have the potential to accelerate the therapeutic development pipeline at early stages via efficient prioritization of candidate genes for various diseases. In this work, we introduce the variational graph auto-encoder (VGAE) as a promising unsupervised approach for learning powerful latent embeddings in disease-gene networks that can be used for the DGP problem, the first approach using a generative model involving graph neural networks (GNNs). In addition to introducing the VGAE as a promising approach to the DGP problem, we further propose an extension (constrained-VGAE or C-VGAE) which adapts the learning algorithm for link prediction between two distinct node types in heterogeneous graphs. We evaluate and demonstrate the effectiveness of the VGAE on general link prediction in a disease-gene association network and the C-VGAE on disease-gene prediction in the same network, using popular random walk driven methods as baselines. While the methodology presented demonstrates potential solely based on utilizing the topology of a disease-gene association network, it can be further enhanced and explored through the integration of additional biological networks such as gene/protein interaction networks and additional biological features pertaining to the diseases and genes represented in the disease-gene association network.

[1]  Paola Velardi,et al.  Network-based methods for disease-gene prediction , 2019, 1902.10117.

[2]  A. Barabasi,et al.  The network takeover , 2011, Nature Physics.

[3]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[4]  Albert-László Barabási,et al.  A DIseAse MOdule Detection (DIAMOnD) Algorithm Derived from a Systematic Analysis of Connectivity Patterns of Disease Proteins in the Human Interactome , 2015, PLoS Comput. Biol..

[5]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2017 , 2016, Nucleic Acids Res..

[6]  Núria Queralt-Rosinach,et al.  DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants , 2016, Nucleic Acids Res..

[7]  A. Barabasi,et al.  An empirical framework for binary interactome mapping , 2008, Nature Methods.

[8]  François Schiettecatte,et al.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders , 2014, Nucleic Acids Res..

[9]  David Gomez-Cabrero,et al.  Data integration in the era of omics: current and future challenges , 2014, BMC Systems Biology.

[10]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[11]  R. Piro,et al.  Computational approaches to disease‐gene prediction: rationale, classification and successes , 2012, The FEBS journal.

[12]  Allam Appa Rao,et al.  Techniques for integrating ‐omics data , 2009, Bioinformation.

[13]  Le Song,et al.  PGCN: Disease gene prioritization by disease and gene embedding through graph convolutional neural networks , 2019, bioRxiv.

[14]  Benjamin J. Raphael,et al.  Pan-Cancer Network Analysis Identifies Combinations of Rare Somatic Mutations across Pathways and Protein Complexes , 2014, Nature Genetics.

[15]  John Quackenbush,et al.  Computational genetics: Computational analysis of microarray data , 2001, Nature Reviews Genetics.

[16]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[17]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[18]  Michael Q. Zhang,et al.  Network-based global inference of human disease genes , 2008, Molecular systems biology.

[19]  Max Welling,et al.  Variational Graph Auto-Encoders , 2016, ArXiv.

[20]  B. Snel,et al.  Predicting disease genes using protein–protein interactions , 2006, Journal of Medical Genetics.

[21]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[22]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[23]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[24]  Qi Zheng,et al.  GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis , 2008, Nucleic Acids Res..

[25]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.