Network embedding in biomedical data science

Owning to the rapid development of computer technologies, an increasing number of relational data have been emerging in modern biomedical research. Many network-based learning methods have been proposed to perform analysis on such data, which provide people a deep understanding of topology and knowledge behind the biomedical networks and benefit a lot of applications for human healthcare. However, most network-based methods suffer from high computational and space cost. There remain challenges on handling high dimensionality and sparsity of the biomedical networks. The latest advances in network embedding technologies provide new effective paradigms to solve the network analysis problem. It converts network into a low-dimensional space while maximally preserves structural properties. In this way, downstream tasks such as link prediction and node classification can be done by traditional machine learning methods. In this survey, we conduct a comprehensive review of the literature on applying network embedding to advance the biomedical domain. We first briefly introduce the widely used network embedding models. After that, we carefully discuss how the network embedding approaches were performed on biomedical networks as well as how they accelerated the downstream tasks in biomedical science. Finally, we discuss challenges the existing network embedding applications in biomedical domains are faced with and suggest several promising future directions for a better improvement in human healthcare.

[1]  Kyungsook Han,et al.  miRNA-Disease Association Prediction with Collaborative Matrix Factorization , 2017, Complex..

[2]  ChengXiang Zhai,et al.  VisAGE: Integrating external knowledge into electronic medical record visualization , 2018, PSB.

[3]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[4]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[5]  E. Xing,et al.  Mixed Membership Stochastic Block Models for Relational Data with Application to Protein-Protein Interactions , 2006 .

[6]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[7]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[8]  Jian Peng,et al.  A Network Integration Approach for Drug-Target Interaction Prediction and Computational Drug Repositioning from Heterogeneous Information , 2017, RECOMB 2017.

[9]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[10]  Nicolas Le Roux,et al.  A latent factor model for highly multi-relational data , 2012, NIPS.

[11]  M. Buchanan,et al.  Networks in cell biology , 2010 .

[12]  Sheng Wang,et al.  Identification of pathways associated with chemosensitivity through network embedding , 2017, bioRxiv.

[13]  Trey Ideker,et al.  Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes , 2010, Bioinform..

[14]  Dmitriy Fradkin,et al.  Robust Mining of Time Intervals with Semi-interval Partial Order Patterns , 2010, SDM.

[15]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[16]  Jian Peng,et al.  ProSNet: integrating homology with molecular networks for protein function prediction , 2017, PSB.

[17]  Bonnie Berger,et al.  Diffusion Component Analysis: Unraveling Functional Topology in Biological Networks , 2015, RECOMB.

[18]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[19]  A. Barabasi,et al.  Network medicine--from obesity to the "diseasome". , 2007, The New England journal of medicine.

[20]  Ping Zhang,et al.  Large-scale structural and textual similarity-based mining of knowledge graph to predict drug-drug interactions , 2017, J. Web Semant..

[21]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[23]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[24]  Sandro Banfi,et al.  microRNAs and genetic diseases , 2009, PathoGenetics.

[25]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[26]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[27]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[28]  Xiangxiang Zeng,et al.  Probability-based collaborative filtering model for predicting gene–disease associations , 2017, BMC Medical Genomics.

[29]  Alex Graves,et al.  Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.

[30]  Chee Keong Kwoh,et al.  Drug-Target Interaction Prediction with Graph Regularized Matrix Factorization , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  Marinka Zitnik,et al.  Collective Pairwise Classification for Multi-Way Analysis of Disease and Drug Data , 2016, PSB.

[32]  Jun Zhao,et al.  Knowledge Graph Embedding via Dynamic Mapping Matrix , 2015, ACL.

[33]  S. R. Fine,et al.  ADVERSE DRUG REACTIONS , 2009, BMJ : British Medical Journal.

[34]  Hao Ding,et al.  Collaborative matrix factorization with multiple similarities for predicting drug-target interactions , 2013, KDD.

[35]  Luonan Chen,et al.  Network-based drug repositioning. , 2013, Molecular bioSystems.

[36]  Jun Yan,et al.  Large‐scale extraction of drug–disease pairs from the medical literature , 2017, J. Assoc. Inf. Sci. Technol..

[37]  Natasa Przulj,et al.  Biological function through network topology: a survey of the human diseasome , 2012, Briefings in functional genomics.

[38]  Joachim M. Buhmann,et al.  Multidimensional Scaling and Data Clustering , 1994, NIPS.

[39]  Carlo Vittorio Cannistraci,et al.  Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding , 2013, Bioinform..

[40]  Michael Q. Zhang,et al.  Network embedding-based representation learning for single cell RNA-seq data , 2017, Nucleic acids research.

[41]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[42]  Charu C. Aggarwal,et al.  Heterogeneous Network Embedding via Deep Architectures , 2015, KDD.

[43]  Gabriel Stanovsky,et al.  Recognizing Mentions of Adverse Drug Reaction in Social Media Using Knowledge-Infused Recurrent Models , 2017, EACL.

[44]  Maxat Kulmanov,et al.  DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier , 2017, Bioinform..

[45]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[46]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[47]  Jimeng Sun,et al.  Using recurrent neural network models for early detection of heart failure onset , 2016, J. Am. Medical Informatics Assoc..

[48]  R. Weinshilboum Inheritance and drug response. , 2003, The New England journal of medicine.

[49]  Changning Liu,et al.  dbDEMC: a database of differentially expressed miRNAs in human cancers , 2010, BMC Genomics.

[50]  Cheng Liang,et al.  Predicting MicroRNA-Disease Associations Using Network Topological Similarity Based on DeepWalk , 2017, IEEE Access.

[51]  Xiao Huang,et al.  Label Informed Attributed Network Embedding , 2017, WSDM.

[52]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[53]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[54]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[55]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[56]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[57]  Damian Smedley,et al.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data , 2014, Nucleic Acids Res..

[58]  Zhiguang Chen,et al.  Neural Learning Control of Flexible Joint Manipulator with Predefined Tracking Performance and Application to Baxter Robot , 2017, Complex..

[59]  Jian Pei,et al.  A Survey on Network Embedding , 2017, IEEE Transactions on Knowledge and Data Engineering.

[60]  Wei Lu,et al.  Deep Neural Networks for Learning Graph Representations , 2016, AAAI.

[61]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[62]  Zhen Ji,et al.  Assessing and predicting protein interactions by combining manifold embedding with multiple information integration , 2012, BMC Bioinformatics.

[63]  Jianxin Chen,et al.  Matrix Factorization-Based Prediction of Novel Drug Indications by Integrating Genomic Space , 2015, Comput. Math. Methods Medicine.

[64]  Alain Guénoche,et al.  Two local dissimilarity measures for weighted graphs with application to protein interaction networks , 2008, Adv. Data Anal. Classif..

[65]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[66]  Meng Wang,et al.  Predicting Rich Drug-Drug Interactions via Biomedical Knowledge Graphs and Text Jointly Embedding , 2017, ArXiv.

[67]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[68]  ChengXiang Zhai,et al.  ContextCare: Incorporating Contextual Information Networks to Representation Learning on Medical Forum Data , 2017, IJCAI.

[69]  Jun Zhao,et al.  Knowledge Graph Completion with Adaptive Sparse Transfer Matrix , 2016, AAAI.

[70]  Min Wu,et al.  Drug-target interaction prediction using ensemble learning and dimensionality reduction. , 2017, Methods.

[71]  Jure Leskovec,et al.  Modeling polypharmacy side effects with graph convolutional networks , 2018, bioRxiv.

[72]  Fei Wang,et al.  A Framework for Mining Signatures from Event Sequences and Its Applications in Healthcare Data , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  K. Kito,et al.  Methods for Protein-Protein Interaction Analysis , 2007 .

[74]  Junyu Dong,et al.  An Overview on Data Representation Learning: From Traditional Feature Learning to Recent Deep Learning , 2016, ArXiv.

[75]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[76]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[77]  Yadong Wang,et al.  miR2Disease: a manually curated database for microRNA deregulation in human disease , 2008, Nucleic Acids Res..

[78]  Bonnie Berger,et al.  Exploiting ontology graph for predicting sparsely annotated gene function , 2015, Bioinform..

[79]  Volker Tresp,et al.  Tensor Factorization for Multi-relational Learning , 2013, ECML/PKDD.

[80]  Mark Gerstein,et al.  Bridging structural biology and genomics: assessing protein interaction data with known complexes. , 2002, Drug discovery today.

[81]  Jason Weston,et al.  A semantic matching energy function for learning with multi-relational data , 2013, Machine Learning.

[82]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[83]  Steven Skiena,et al.  Walklets: Multiscale Graph Embeddings for Interpretable Network Classification , 2016, ArXiv.

[84]  Meng Wang,et al.  Safe Medicine Recommendation via Medical Knowledge Graph Embedding , 2017, ArXiv.

[85]  Bin He,et al.  EMR-based medical knowledge representation and inference via Markov random fields and distributed representation learning , 2017, Artif. Intell. Medicine.

[86]  Ping Zhang,et al.  Risk Prediction with Electronic Health Records: A Deep Learning Approach , 2016, SDM.

[87]  Koki Tsuyuzaki,et al.  Biological Systems as Heterogeneous Information Networks: A Mini-review and Perspectives , 2017, ArXiv.

[88]  Hyeon-Eui Kim,et al.  Deep mining heterogeneous networks of biomedical linked data to predict novel drug‐target associations , 2017, Bioinform..

[89]  Yoshihiro Yamanishi,et al.  DINIES: drug–target interaction network inference engine based on supervised analysis , 2014, Nucleic Acids Res..

[90]  Yu Hao,et al.  Knowlege Graph Embedding by Flexible Translation , 2015, ArXiv.

[91]  De-Shuang Huang,et al.  A Two-Stage Geometric Method for Pruning Unreliable Links in Protein-Protein Networks , 2015, IEEE Transactions on NanoBioscience.

[92]  Lorenzo Rosasco,et al.  Holographic Embeddings of Knowledge Graphs , 2015, AAAI.

[93]  R. Albert Scale-free networks in cell biology , 2005, Journal of Cell Science.

[94]  Zhiyong Lu,et al.  A survey of current trends in computational drug repositioning , 2016, Briefings Bioinform..

[95]  E. Miska,et al.  MicroRNA functions in animal development and human disease , 2005, Development.

[96]  N. Lynam‐Lennon,et al.  The roles of microRNA in cancer and apoptosis , 2009, Biological reviews of the Cambridge Philosophical Society.

[97]  Nikos Mamoulis,et al.  Heterogeneous Information Network Embedding for Meta Path based Proximity , 2017, ArXiv.

[98]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[99]  Desmond J. Higham,et al.  Geometric De-noising of Protein-Protein Interaction Networks , 2009, PLoS Comput. Biol..

[100]  David Sontag,et al.  Learning Low-Dimensional Representations of Medical Concepts , 2016, CRI.

[101]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[102]  Akira R. Kinjo,et al.  Neuro-symbolic representation learning on biological knowledge graphs , 2016, Bioinform..

[103]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[104]  Edoardo M. Airoldi,et al.  Stochastic Block Models of Mixed Membership , 2006 .

[105]  Huajun Chen,et al.  Semantic web for integrated network analysis in biomedicine , 2009, Briefings Bioinform..

[106]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[107]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[108]  Alexander J. Smola,et al.  Distributed large-scale natural graph factorization , 2013, WWW.

[109]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[110]  Fei Wang,et al.  Drug knowledge bases and their applications in biomedical informatics research , 2019, Briefings Bioinform..

[111]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[112]  Wei Zhang,et al.  Network-based machine learning and graph theory algorithms for precision oncology , 2017, npj Precision Oncology.

[113]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[114]  X. Chen,et al.  TTD: Therapeutic Target Database , 2002, Nucleic Acids Res..

[115]  Zhendong Mao,et al.  Knowledge Graph Embedding: A Survey of Approaches and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[116]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[117]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[118]  Gang Fu,et al.  Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data , 2014, Nucleic Acids Res..

[119]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[120]  Fabian Mörchen,et al.  Efficient mining of understandable patterns from multivariate interval time series , 2007, Data Mining and Knowledge Discovery.

[121]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[122]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[123]  Joshua M. Stuart,et al.  Integrating genotype and phenotype information: an overview of the PharmGKB project , 2001, The Pharmacogenomics Journal.

[124]  Yizhou Sun,et al.  Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification , 2016, WSDM.

[125]  Le Song,et al.  GRAM: Graph-based Attention Model for Healthcare Representation Learning , 2016, KDD.

[126]  Philip S. Yu,et al.  Embedding of Embedding (EOE): Joint Embedding for Coupled Heterogeneous Networks , 2017, WSDM.

[127]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[128]  Hui Xiong,et al.  Temporal Phenotyping from Longitudinal Electronic Health Records: A Graph Based Framework , 2015, KDD.

[129]  Chang Liu,et al.  Predicting Drug–Target Interactions Using Probabilistic Matrix Factorization , 2013, J. Chem. Inf. Model..

[130]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.