论文信息 - Continual representation learning for evolving biomedical bipartite networks

Continual representation learning for evolving biomedical bipartite networks

MOTIVATION Many real-world biomedical interactions such as 'gene-disease', 'disease-symptom', and 'drug-target' are modeled as a bipartite network structure. Learning meaningful representations for such networks is a fundamental problem in the research area of Network Representation Learning (NRL). NRL approaches aim to translate the network structure into low-dimensional vector representations that are useful to a variety of biomedical applications. Despite significant advances, the existing approaches still have certain limitations. First, a majority of these approaches do not model the unique topological properties of bipartite networks. Consequently, their straightforward application to the bipartite graphs yields unsatisfactory results. Second, the existing approaches typically learn representations from static networks. This is limiting for the biomedical bipartite networks that evolve at a rapid pace, and thus necessitate the development of approaches that can update the representations in an online fashion. RESULTS In this research, we propose a novel representation learning approach that accurately preserves the intricate bipartite structure, and efficiently updates the node representations. Specifically, we design a customized autoencoder that captures the proximity relationship between nodes participating in the bipartite bicliques (2 × 2 sub-graph), while preserving both the global and local structures. Moreover, the proposed structure-preserving technique is carefully interleaved with the central tenets of continual machine learning to design an incremental learning strategy that updates the node representations in an online manner. Taken together, the proposed approach produces meaningful representations with high fidelity and computational efficiency. Extensive experiments conducted on several biomedical bipartite networks validate the effectiveness and rationality of the proposed approach. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1] Yun Zhang,et al. On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types , 2013, BMC Bioinformatics.

[2] Bing Liu,et al. Lifelong machine learning: a paradigm for continuous learning , 2017, Frontiers of Computer Science.

[3] Jian Pei,et al. A Survey on Network Embedding , 2017, IEEE Transactions on Knowledge and Data Engineering.

[4] Jeremy J. Jay,et al. Ontological Discovery Environment: a system for integrating gene-phenotype associations. , 2009, Genomics.

[5] Jun Yan,et al. Large‐scale extraction of drug–disease pairs from the medical literature , 2017, J. Assoc. Inf. Sci. Technol..

[6] Hyeon-Eui Kim,et al. Deep mining heterogeneous networks of biomedical linked data to predict novel drug‐target associations , 2017, Bioinform..

[7] Mike Tyers,et al. BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[8] Zhiyong Lu,et al. PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[9] Palash Goyal,et al. dyngraph2vec: Capturing Network Dynamics using Dynamic Graph Representation Learning , 2018, Knowl. Based Syst..

[10] Zhiyong Lu,et al. PubMed and beyond: a survey of web tools for searching biomedical literature , 2011, Database J. Biol. Databases Curation.