A drug information embedding method based on graph convolution neural network

New drug development is an extremely time-consuming and high-risk process. [1]It has been widely valued by the biomedical industry to fully explore the new uses of existing drugs and reorientate them. [2]How to find drug disease with potential therapeutic relationship from a large number of unproven relationship pairs is the research focus of drug reorientation. With the help of machine learning model, we can improve the enrichment degree of potential drug disease relationship pairs, and reduce the false positive rate of prediction. In the past few years, a series of graph based convolutional network models have been developed to calculate the information latent feature representation of nodes and links. Researchers at home and abroad have done a lot of research on network embedding technology based on biomedical data, and have achieved a series of important research results. Among them, the research methods used can be divided into two categories: one is the traditional machine learning algorithm based on artificial feature extraction, the other is the method based on deep learning. For example, kipf and welling [3]proposed a new graph convolution network (GCN) with parts of existing models, DeepDR [4] and DTINet [5] based on node characteristics and their connections, which can be used for node classification. Aiming at the problem of imbalance of drug information data samples, the invention provides a drug relocation method based on deep learning multi-source heterogeneous network. In order to avoid the limitations of traditional feature extraction methods, such as highly dependent on the experience and knowledge of medical staff, strong subjectivity, consuming a lot of time and energy to complete, and extracting high-quality features with distinguishing features often exists In this paper, with the help of graph convolution encoder model and variational auto encoder neural network, we can automatically learn the characteristics of multi-source and heterogeneous drug low-dimensional network, and complete the drug relocation of drug disease association prediction.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  M. de Rijke,et al.  A Collective Variational Autoencoder for Top-N Recommendation with Side Information , 2018, DLRS@RecSys.

[3]  Olivier Bodenreider,et al.  Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[4]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[5]  Richard Bonneau,et al.  deepNF: deep network fusion for protein function prediction , 2017, bioRxiv.

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[9]  Chuan Shi,et al.  Adversarial Learning on Heterogeneous Information Networks , 2019, KDD.

[10]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[11]  Nada Lavrac,et al.  Deep Node Ranking: an Algorithm for Structural Network Embedding and End-to-End Classification , 2019, ArXiv.

[12]  Nagarajan Natarajan,et al.  Inductive matrix completion for predicting gene–disease associations , 2014, Bioinform..

[13]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[14]  Chirag J Patel,et al.  A standard database for drug repositioning , 2017, Scientific Data.

[15]  Peer Bork,et al.  The SIDER database of drugs and side effects , 2015, Nucleic Acids Res..

[16]  Jian Pei,et al.  A Survey on Network Embedding , 2017, IEEE Transactions on Knowledge and Data Engineering.

[17]  Xiangrong Liu,et al.  deepDR: a network-based deep learning approach to in silico drug repositioning , 2019, Bioinform..

[18]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[19]  Fei Wang,et al.  Drug Similarity Integration Through Attentive Multi-view Graph Auto-Encoders , 2018, IJCAI.

[20]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[21]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[22]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[23]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[24]  Tao Jiang,et al.  NeoDTI: Neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions , 2018, bioRxiv.

[25]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[26]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[27]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database's 10th year anniversary: update 2015 , 2014, Nucleic Acids Res..

[28]  Sangkeun Lee,et al.  Random walk based entity ranking on graph for multidimensional recommendation , 2011, RecSys '11.

[29]  Samuel Kaski,et al.  Kernelized Bayesian Matrix Factorization , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.