Multipath2vec: Predicting Pathogenic Genes via Heterogeneous Network Embedding

Phenotypically similar diseases have been verified to be in connection with specific genes. Predicting disease genes is important in disease prevention, diagnosis, and treatment. In this work, we focus on this significant issue and propose a disease-causing genes prediction method called Multipath2vec. First, we generate an heterogeneous network called GP - network, which is constructed based on three kinds of relationships between genes and phenotypes, including interactions between genes, correlations between phenotypes, and known gene-phenotype pairs. Then, we propose the multi-path, which is used to guide random walk in GP-network in order to better embedding the network. Finally, we use the achieved vector representation of each protein and phenotype to calculate and rank the similarities between candidate genes and the target phenotype. We implement Multipath2vec as well as two baseline approaches (i.e., CATAPULT, and PRINCE) on whole gene-phenotype data, single-gene gene-phenotype data, and many-genes gene-phenotype data. According to leave-one-out cross validation, Multipath2vec achieves better results than baseline approaches. To our best knowledge, this is the first attempt to use heterogeneous network embedding method in handling pathogenic genes. The outperformed experimental results of Multipath2vec shed light on the possibility of applying network representation methods in the disease-causing genes prediction.

[1]  Michael Q. Zhang,et al.  Network-based global inference of human disease genes , 2008, Molecular systems biology.

[2]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[3]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[4]  Lin Li,et al.  A gene–phenotype relationship extraction pipeline from the biomedical literature using a representation learning approach , 2018, Bioinform..

[5]  Ming Gao,et al.  BiNE: Bipartite Network Embedding , 2018, SIGIR.

[6]  John O. Woods,et al.  Prediction and Validation of Gene-Disease Associations Using Methods Inspired by Social Network Analyses , 2013, PloS one.

[7]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[8]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[9]  Abolfazl Doostparast Torshizi,et al.  Graph-based semi-supervised learning with genomic data integration using condition-responsive genes applied to phenotype classification , 2018, J. Am. Medical Informatics Assoc..

[10]  Enrique J. deAndrés-Galiana,et al.  Sensitivity analysis of gene ranking methods in phenotype prediction , 2016, J. Biomed. Informatics.

[11]  Xiaoli Li,et al.  Inferring Gene-Phenotype Associations via Global Protein Complex Network Propagation , 2011, PloS one.

[12]  Philip S. Yu,et al.  Deep Dynamic Network Embedding for Link Prediction , 2018, IEEE Access.

[13]  Patrice Godard,et al.  PCAN: phenotype consensus analysis to support disease-gene association , 2016, BMC Bioinformatics.

[14]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[15]  Sampo Pyysalo,et al.  Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches , 2018, BMC Bioinformatics.

[16]  Sung-Pil Choi,et al.  Extraction of protein–protein interactions (PPIs) from the literature by deep convolutional neural networks with various feature embeddings , 2018, J. Inf. Sci..

[17]  M. Jarvelin,et al.  A Common Variant in the FTO Gene Is Associated with Body Mass Index and Predisposes to Childhood and Adult Obesity , 2007, Science.

[18]  George Hripcsak,et al.  Approaches for using temporal and other filters for next generation phenotype discovery , 2016, AMIA.

[19]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[20]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.

[21]  Hyunju Lee,et al.  Identification of cancer driver genes in focal genomic aberrations from whole‐exome sequencing data , 2017, Bioinform..

[22]  Gul Muhammad Khan Evolution of Artificial Neural Development - In Search of Learning Genes , 2018, Studies in Computational Intelligence.