Prediction of lncRNA-Disease Associations from Heterogeneous Information Network Based on DeepWalk Embedding Model

Long non-coding RNA is a class of non-coding RNAs, with a length of more than 200 nucleotides. A large number of studies have shown that lncRNAs are involved in various life processes of the Human body and play an important role in the occurrence, development, and treatment of Human diseases. However, it is time-consuming and laborious to identify the associations between lncRNAs and diseases by traditional methods. In this paper, we propose a novel computational method to predict lncRNA-disease associations based on a heterogeneous information network. Specifically, the heterogeneous information network is constructed by integrating known associations among drugs, proteins, lncRNA, miRNA and diseases. After that, the network embedding method Online Learning of Social Representations (DeepWalk) is employed to learn vector representation of nodes in heterogeneous information network. Finally, we trained the random forest classifier to classify and predict the relationship between lncRNA and disease. As a result, the proposed method achieves average AUC of 0.8171 using five-fold cross-validation. The experimental results show that our method performs better than existing approaches, so it can be a useful tool for predicting disease-related lncRNA.