BioERP: biomedical heterogeneous network-based self-supervised representation learning approach for entity relationship predictions

MOTIVATION Predicting entity relationship can greatly benefit important biomedical problems. Recently, a large amount of biomedical heterogeneous networks (BioHNs) are generated and offer opportunities for developing network-based learning approaches to predict relationships among entities. However, current researches slightly explored BioHNs-based self-supervised representation learning methods, and are hard to simultaneously capturing local- and global-level association information among entities. RESULTS In this study, we propose a biomedical heterogeneous network-based self-supervised representation learning approach for entity relationship predictions, termed BioERP. A self-supervised meta path detection mechanism is proposed to train a deep Transformer encoder model that can capture the global structure and semantic feature in BioHNs. Meanwhile, a biomedical entity mask learning strategy is designed to reflect local associations of vertices. Finally, the representations from different task models are concatenated to generate two-level representation vectors for predicting relationships among entities. The results on eight datasets show BioERP outperforms 30 state-of-the-art methods. In particular, BioERP reveals great performance with results close to 1 in terms of AUC and AUPR on the drug-target interaction predictions. In summary, BioERP is a promising bio-entity relationship prediction approach. AVAILABILITY Source code and data can be downloaded from https://github.com/pengsl-lab/BioERP.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Chee Keong Kwoh,et al.  Drug-target interaction prediction by learning from local information and neighbors , 2013, Bioinform..

[2]  Nagarajan Natarajan,et al.  Inductive matrix completion for predicting gene–disease associations , 2014, Bioinform..

[3]  Jure Leskovec,et al.  Modeling polypharmacy side effects with graph convolutional networks , 2018, bioRxiv.

[4]  R. Hoehndorf,et al.  Predicting candidate genes from phenotypes, functions and anatomical site of expression , 2020, Bioinformatics.

[5]  Xin Gao,et al.  Semantic similarity and machine learning with ontologies , 2020, Briefings Bioinform..

[6]  Xin Gao,et al.  OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction , 2018, Bioinform..

[7]  Xiaodong Li,et al.  HerGePred: Heterogeneous Network Embedding Representation for Disease Gene Prediction , 2019, IEEE Journal of Biomedical and Health Informatics.

[8]  Xiangxiang Zeng,et al.  Network-based prediction of drug-target interactions using an arbitrary-order proximity embedded deep forest , 2020, Bioinform..

[9]  Seongok Ryu,et al.  Predicting Drug-Target Interaction Using a Novel Graph Neural Network with 3D Structure-Embedded Graph Representation , 2019, J. Chem. Inf. Model..

[10]  Akira R. Kinjo,et al.  Neuro-symbolic representation learning on biological knowledge graphs , 2016, Bioinform..

[11]  Pushpak Bhattacharyya,et al.  Feature assisted stacked attentive shortest dependency path based Bi-LSTM model for protein-protein interaction , 2019, Knowl. Based Syst..

[12]  Robert Hoehndorf,et al.  Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes , 2018, bioRxiv.

[13]  Salvatore Alaimo,et al.  Drug–target interaction prediction through domain-tuned network-based inference , 2013, Bioinform..

[14]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[15]  Tao Jiang,et al.  NeoDTI: Neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions , 2018, bioRxiv.

[16]  Bin Chen,et al.  Predicting drug target interactions using meta-path-based semantic network analysis , 2016, BMC Bioinformatics.

[17]  Jiawei Luo,et al.  Predicting human microbe-disease associations via graph attention networks with inductive matrix completion , 2020, Briefings Bioinform..

[18]  Philip S. Yu,et al.  PathSim , 2011 .

[19]  Fei Wang,et al.  Network embedding in biomedical data science , 2018, Briefings Bioinform..

[20]  J. Tegnér,et al.  DeepViral: prediction of novel virus–host interactions from protein sequences and infectious disease phenotypes , 2021, Bioinform..

[21]  Yuedong Yang,et al.  PharmKG: a dedicated knowledge graph benchmark for bomedical data mining , 2020, Briefings Bioinform..

[22]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[23]  Xiaobo Zhou,et al.  Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces , 2010, BMC Systems Biology.

[24]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[25]  Xiangrong Liu,et al.  deepDR: a network-based deep learning approach to in silico drug repositioning , 2019, Bioinform..

[26]  Lenore Cowen,et al.  New directions for diffusion-based network prediction of protein function: incorporating pathways with confidence , 2014, Bioinform..

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  John O. Woods,et al.  Prediction and Validation of Gene-Disease Associations Using Methods Inspired by Social Network Analyses , 2013, PloS one.

[29]  Samuel Kaski,et al.  Kernelized Bayesian Matrix Factorization , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Ao Li,et al.  A novel approach for drug response prediction in cancer cell lines via network representation learning , 2018, Bioinform..

[31]  Kenli Li,et al.  DeepR2cov: deep representation learning on heterogeneous drug networks to discover anti-inflammatory agents for COVID-19 , 2021, Briefings Bioinform..

[32]  Jian Peng,et al.  A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information , 2017, Nature Communications.

[33]  P. Mehta,et al.  COVID-19: consider cytokine storm syndromes and immunosuppression , 2020, The Lancet.

[34]  Michael Q. Zhang,et al.  Network embedding-based representation learning for single cell RNA-seq data , 2017, Nucleic acids research.

[35]  Srinivasan Parthasarathy,et al.  Graph embedding on biomedical networks: methods, applications and evaluations , 2019, Bioinform..

[36]  Rafael C. Jimenez,et al.  Merging and scoring molecular interactions utilising existing community standards: tools, use-cases and a case study , 2015, Database J. Biol. Databases Curation.

[37]  Xin Gao,et al.  Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations , 2018, Bioinform..

[38]  Philip S. Yu,et al.  Heterogeneous Information Network Embedding for Recommendation , 2017, IEEE Transactions on Knowledge and Data Engineering.

[39]  Bindu Nanduri,et al.  HPIDB 2.0: a curated database for host–pathogen interactions , 2016, Database J. Biol. Databases Curation.