A Survey on Heterogeneous Graph Embedding: Methods, Techniques, Applications and Sources

Heterogeneous graphs (HGs) also known as heterogeneous information networks have become ubiquitous in real-world scenarios; therefore, HG embedding, which aims to learn representations in a lower-dimension space while preserving the heterogeneous structures and semantics for downstream tasks (e.g., node/graph classification, node clustering, link prediction), has drawn considerable attentions in recent years. In this survey, we perform a comprehensive review of the recent development on HG embedding methods and techniques. We first introduce the basic concepts of HG and discuss the unique challenges brought by the heterogeneity for HG embedding in comparison with homogeneous graph representation learning; and then we systemically survey and categorize the state-of-the-art HG embedding methods based on the information they used in the learning process to address the challenges posed by the HG heterogeneity. In particular, for each representative HG embedding method, we provide detailed introduction and further analyze its pros and cons; meanwhile, we also explore the transformativeness and applicability of different types of HG embedding methods in the real-world industrial environments for the first time. In addition, we further present several widely deployed systems that have demonstrated the success of HG embedding techniques in resolving real-world application problems with broader impacts. To facilitate future research and applications in this area, we also summarize the open-source code, existing graph learning platforms and benchmark datasets. Finally, we explore the additional issues and challenges of HG embedding and forecast the future research directions in this field.

[1]  Chengqi Zhang,et al.  MetaGraph2Vec: Complex Semantic Path Augmented Heterogeneous Network Embedding , 2018, PAKDD.

[2]  Yizhou Sun,et al.  Heterogeneous Graph Transformer , 2020, WWW.

[3]  Jian Pei,et al.  Community Preserving Network Embedding , 2017, AAAI.

[4]  Jiawei Han,et al.  Large-Scale Embedding Learning in Heterogeneous Event Data , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[5]  Liwei Qiu,et al.  Scalable Multiplex Network Embedding , 2018, IJCAI.

[6]  Yizhou Sun,et al.  GPT-GNN: Generative Pre-Training of Graph Neural Networks , 2020, KDD.

[7]  Xin Li,et al.  Your Style Your Identity: Leveraging Writing and Photography Styles for Drug Trafficker Identification in Darknet Markets over Attributed Heterogeneous Information Network , 2019, WWW.

[8]  Wenwu Zhu,et al.  Deep Learning on Graphs: A Survey , 2018, IEEE Transactions on Knowledge and Data Engineering.

[9]  Philip S. Yu,et al.  Leveraging Meta-path based Context for Top- N Recommendation with A Neural Co-Attention Model , 2018, KDD.

[10]  Tao Zhang,et al.  Recommendation in Heterogeneous Information Networks Based on Generalized Random Walk Model and Bayesian Personalized Ranking , 2018, WSDM.

[11]  Fei Wang,et al.  Structural Deep Embedding for Hyper-Networks , 2017, AAAI.

[12]  Zheng Wang,et al.  HetETA: Heterogeneous Information Network Embedding for Estimating Time of Arrival , 2020, KDD.

[13]  Kevin Chen-Chuan Chang,et al.  Heterogeneous Embedding Propagation for Large-Scale E-Commerce User Alignment , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[14]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[15]  S. Sitharama Iyengar,et al.  A Survey on Malware Detection Using Data Mining Techniques , 2017, ACM Comput. Surv..

[16]  Kevin Chen-Chuan Chang,et al.  Distance-Aware DAG Embedding for Proximity Search on Heterogeneous Graphs , 2018, AAAI.

[17]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[18]  Yu He,et al.  HeteSpaceyWalk: A Heterogeneous Spacey Random Walk for Heterogeneous Information Network Embedding , 2019, CIKM.

[19]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[20]  Linmei Hu,et al.  Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification , 2019, EMNLP.

[21]  Zhao Li,et al.  Interactive Paths Embedding for Semantic Proximity Search on Heterogeneous Graphs , 2018, KDD.

[22]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[23]  Philip S. Yu,et al.  BL-MNE: Emerging Heterogeneous Social Network Embedding Through Broad Learning with Aligned Autoencoder , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[24]  Yanfang Ye,et al.  Out-of-sample Node Representation Learning for Heterogeneous Graph in Real-time Android Malware Detection , 2019, IJCAI.

[25]  Philip S. Yu,et al.  Heterogeneous Graph Matching Networks for Unknown Malware Detection , 2019, IJCAI.

[26]  Yanfang Ye,et al.  αCyber: Enhancing Robustness of Android Malware Detection System against Adversarial Attacks on Heterogeneous Graph based Model , 2019, CIKM.

[27]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[28]  William L. Hamilton,et al.  Compositional Fairness Constraints for Graph Embeddings , 2019, ICML.

[29]  Yizhou Sun,et al.  Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification , 2016, WSDM.

[30]  Yongliang Li,et al.  Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation , 2019, KDD.

[31]  Laks V. S. Lakshmanan,et al.  HeteroMF: recommendation in heterogeneous information networks using context dependent factor models , 2013, WWW.

[32]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[33]  Steve Hanna,et al.  A survey of mobile malware in the wild , 2011, SPSM '11.

[34]  Philip S. Yu,et al.  Deep Diffusive Neural Network based Fake News Detection from Heterogeneous Social Networks , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[35]  Philip S. Yu,et al.  BasConv: Aggregating Heterogeneous Interactions for Basket Recommendation with Graph Convolutional Neural Network , 2020, SDM.

[36]  Jun Hu,et al.  Hierarchical Graph Semantic Pooling Network for Multi-modal Community Question Answer Matching , 2019, ACM Multimedia.

[37]  Jun Zhao,et al.  IntentGC: A Scalable Graph Convolution Framework Fusing Heterogeneous Information for Recommendation , 2019, KDD.

[38]  Philip S. Yu,et al.  Embedding of Embedding (EOE): Joint Embedding for Coupled Heterogeneous Networks , 2017, WSDM.

[39]  Philip S. Yu,et al.  Semantic Path based Personalized Recommendation on Weighted Heterogeneous Information Networks , 2015, CIKM.

[40]  Amin Milani Fard,et al.  Relationship Prediction in Dynamic Heterogeneous Information Networks , 2019, ECIR.

[41]  Yanfang Ye,et al.  Heterogeneous Graph Attention Network , 2019, WWW.

[42]  Bai Wang,et al.  Deep Adversarial Completion for Sparse Heterogeneous Information Network Embedding , 2020, WWW.

[43]  Jie Tang,et al.  Representation Learning for Attributed Multiplex Heterogeneous Network , 2019, KDD.

[44]  Philippe Cudré-Mauroux,et al.  Are Meta-Paths Necessary?: Revisiting Heterogeneous Graph Embeddings , 2018, CIKM.

[45]  Gillian Dobbie,et al.  Network Embedding and Change Modeling in Dynamic Heterogeneous Networks , 2019, SIGIR.

[46]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[47]  Jingpu Zhang,et al.  Prediction of lncRNA-protein interactions using HeteSim scores based on heterogeneous networks , 2017, Scientific Reports.

[48]  Gholamreza Haffari,et al.  Graph-to-Sequence Learning using Gated Graph Neural Networks , 2018, ACL.

[49]  Yanfang Ye,et al.  Gotcha - Sly Malware!: Scorpion A Metagraph2vec Based Malware Detection System , 2018, KDD.

[50]  Changsheng Xu,et al.  Attentive Interactive Convolutional Matching for Community Question Answering in Social Multimedia , 2018, ACM Multimedia.

[51]  Philip S. Yu,et al.  HinCTI: A Cyber Threat Intelligence Modeling and Identification System Based on Heterogeneous Information Network , 2020, IEEE Transactions on Knowledge and Data Engineering.

[52]  Tsuyoshi Murata,et al.  MELL: Effective Embedding Method for Multiplex Networks , 2018, WWW.

[53]  Yue Zhang,et al.  A Graph-to-Sequence Model for AMR-to-Text Generation , 2018, ACL.

[54]  Yongdong Zhang,et al.  Semi-supervised User Profiling with Heterogeneous Graph Attention Networks , 2019, IJCAI.

[55]  Han Zhang,et al.  Dynamic Heterogeneous Graph Neural Network for Real-time Event Prediction , 2020, KDD.

[56]  Francisco Herrera,et al.  A Tutorial on Distance Metric Learning: Mathematical Foundations, Algorithms and Software , 2018, ArXiv.

[57]  Douwe Kiela,et al.  Hyperbolic Graph Neural Networks , 2019, NeurIPS.

[58]  Xiao Wang,et al.  Hyperbolic Heterogeneous Information Network Embedding , 2019, AAAI.

[59]  Philip S. Yu,et al.  A Survey on Knowledge Graphs: Representation, Acquisition and Applications , 2020, ArXiv.

[60]  Philip S. Yu,et al.  HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[61]  Chuan Shi,et al.  Adversarial Learning on Heterogeneous Information Networks , 2019, KDD.

[62]  Le Song,et al.  Heterogeneous Graph Neural Networks for Malicious Account Detection , 2018, CIKM.

[63]  Max Welling,et al.  Variational Graph Auto-Encoders , 2016, ArXiv.

[64]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[65]  Jaewoo Kang,et al.  Graph Transformer Networks , 2019, NeurIPS.

[66]  Hongxia Yang,et al.  Learning Disentangled Representations for Recommendation , 2019, NeurIPS.

[67]  Yuan Qi,et al.  Cash-Out User Detection Based on Attributed Heterogeneous Information Network with a Hierarchical Attention Mechanism , 2019, AAAI.

[68]  Philip S. Yu,et al.  Metapath Enhanced Graph Attention Encoder for HINs Representation Learning , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[69]  Chuan Zhou,et al.  Relation Structure-Aware Heterogeneous Graph Neural Network , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[70]  Dik Lun Lee,et al.  Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks , 2017, KDD.

[71]  Dik Lun Lee,et al.  Motif Enhanced Recommendation over Heterogeneous Information Network , 2019, CIKM.

[72]  Philip S. Yu,et al.  Fine-grained Event Categorization with Heterogeneous Graph Convolutional Networks , 2019, IJCAI.

[73]  Michael Backes,et al.  Fairwalk: Towards Fair Graph Embedding , 2019, IJCAI.

[74]  Frank D. Wood,et al.  Learning Disentangled Representations with Semi-Supervised Deep Generative Models , 2017, NIPS.

[75]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[76]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[77]  Wen Jiang,et al.  Dynamic Heterogeneous Graph Embedding Using Hierarchical Attentions , 2020, ECIR.

[78]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[79]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[80]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[81]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[82]  Minyi Guo,et al.  GraphGAN: Graph Representation Learning with Generative Adversarial Nets , 2017, AAAI.

[83]  Xiang Li,et al.  Meta Structure: Computing Relevance in Large Heterogeneous Information Networks , 2016, KDD.

[84]  Yanan Xu,et al.  Learning Shared Vertex Representation in Heterogeneous Graphs with Convolutional Networks for Recommendation , 2019, IJCAI.

[85]  Xiaojun Wan,et al.  Heterogeneous Graph Transformer for Graph-to-Sequence Learning , 2020, ACL.

[86]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[87]  Xing Zhang,et al.  Unified Embedding Model over Heterogeneous Information Network for Personalized Recommendation , 2019, IJCAI.

[88]  Shouhuai Xu,et al.  iDev: Enhancing Social Coding Security by Cross-platform User Identification Between GitHub and Stack Overflow , 2019, IJCAI.

[89]  Jieping Ye,et al.  An Attention-based Graph Neural Network for Heterogeneous Structural Learning , 2019, AAAI.

[90]  Wang-Chien Lee,et al.  HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning , 2017, CIKM.

[91]  S. Helgason Differential Geometry, Lie Groups, and Symmetric Spaces , 1978 .

[92]  Yanfang Ye,et al.  HinDroid: An Intelligent Android Malware Detection System Based on Structured Heterogeneous Information Network , 2017, KDD.

[93]  Linmei Hu,et al.  Graph Neural News Recommendation with Long-term and Short-term Interest Modeling , 2020, Inf. Process. Manag..

[94]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[95]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[96]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[97]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[98]  Pietro Liò,et al.  Deep Graph Infomax , 2018, ICLR.

[99]  Dong Li,et al.  Spam Review Detection with Graph Convolutional Networks , 2019, CIKM.

[100]  Yizhou Sun,et al.  Heterogeneous Network Representation Learning , 2020, IJCAI.

[101]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[102]  Xing Xie,et al.  A Survey on Knowledge Graph-Based Recommender Systems , 2020, IEEE Transactions on Knowledge and Data Engineering.

[103]  Yi Hu,et al.  Modeling Dynamic Heterogeneous Network for Link Prediction using Hierarchical Attention with Temporal RNN , 2020, ArXiv.

[104]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[105]  Yangqiu Song,et al.  Hyper-Path-Based Representation Learning for Hyper-Networks , 2019, CIKM.

[106]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[107]  Chengqi Zhang,et al.  Network Representation Learning: A Survey , 2017, IEEE Transactions on Big Data.

[108]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[109]  Jiawei Han,et al.  Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks , 2018, KDD.

[110]  Xiao-Ming Wu,et al.  Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[111]  Jian Pei,et al.  Arbitrary-Order Proximity Preserved Network Embedding , 2018, KDD.

[112]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[113]  Zhanxing Zhu,et al.  Multi-Stage Self-Supervised Learning for Graph Convolutional Networks , 2020, AAAI.

[114]  Min Wu,et al.  mg2vec: Learning Relationship-Preserving Heterogeneous Graph Representations via Metagraph Embedding , 2020, IEEE Transactions on Knowledge and Data Engineering.

[115]  Hwanjo Yu,et al.  BHIN2vec: Balancing the Type of Relation in Heterogeneous Information Network , 2019, CIKM.

[116]  Yizhou Sun,et al.  Heterogeneous Network Representation Learning: Survey, Benchmark, Evaluation, and Beyond , 2020, ArXiv.

[117]  Yizhou Sun,et al.  Recommendation in heterogeneous information networks with implicit user feedback , 2013, RecSys.

[118]  Kevin Chen-Chuan Chang,et al.  Subgraph-augmented Path Embedding for Semantic User Search on Heterogeneous Social Network , 2018, WWW.

[119]  Yuxiao Dong,et al.  GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training , 2020, KDD.

[120]  Hao Wang,et al.  PME: Projected Metric Embedding on Heterogeneous Networks for Link Prediction , 2018, KDD.

[121]  Xiang Li,et al.  Spectral Clustering in Heterogeneous Information Networks , 2019, AAAI.

[122]  Yanfang Ye,et al.  Network Schema Preserving Heterogeneous Information Network Embedding , 2020, IJCAI.

[123]  Minyi Guo,et al.  Knowledge Graph Convolutional Networks for Recommender Systems , 2019, WWW.

[124]  Jian Pei,et al.  A Survey on Network Embedding , 2017, IEEE Transactions on Knowledge and Data Engineering.

[125]  Xing Xie,et al.  Graph Neural News Recommendation with Unsupervised Preference Disentanglement , 2020, ACL.

[126]  Yizhou Sun,et al.  Mining heterogeneous information networks: a structural analysis approach , 2013, SKDD.

[127]  Tianlong Chen,et al.  When Does Self-Supervision Help Graph Convolutional Networks? , 2020, ICML.

[128]  Nitesh V. Chawla,et al.  SHNE: Representation Learning for Semantic-Associated Heterogeneous Networks , 2019, WSDM.

[129]  Linmei Hu,et al.  Entity set expansion in knowledge graph: a heterogeneous information network perspective , 2020, Frontiers of Computer Science.

[130]  Irwin King,et al.  MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding , 2020, WWW.

[131]  Wenwu Zhu,et al.  Deep Variational Network Embedding in Wasserstein Space , 2018, KDD.

[132]  Majid Sarrafzadeh,et al.  HeteroMed: Heterogeneous Information Network for Medical Diagnosis , 2018, CIKM.

[133]  Charu C. Aggarwal,et al.  Heterogeneous Network Embedding via Deep Architectures , 2015, KDD.

[134]  Xia Hu,et al.  Fairness in Deep Learning: A Computational Perspective , 2019, IEEE Intelligent Systems.

[135]  Philip S. Yu,et al.  Integrating Topic Model and Heterogeneous Information Network for Aspect Mining with Rating Bias , 2019, PAKDD.

[136]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[137]  Yanfang Ye,et al.  Key Player Identification in Underground Forums over Attributed Heterogeneous Information Network Embedding Framework , 2019, CIKM.

[138]  Nitesh V. Chawla,et al.  Camel: Content-Aware and Meta-path Augmented Metric Learning for Author Identification , 2018, WWW.

[139]  Xiao Wang,et al.  Dynamic Heterogeneous Information Network Embedding With Meta-Path Based Proximity , 2022, IEEE Transactions on Knowledge and Data Engineering.

[140]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[141]  Nitesh V. Chawla,et al.  Heterogeneous Graph Neural Network , 2019, KDD.

[142]  Philip S. Yu,et al.  Multi-information Source HIN for Medical Concept Embedding , 2020, PAKDD.

[143]  Jure Leskovec,et al.  Hyperbolic Graph Convolutional Neural Networks , 2019, NeurIPS.

[144]  Xiao Liu,et al.  Self-supervised Learning: Generative or Contrastive , 2020, ArXiv.

[145]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[146]  Jiawei Han,et al.  Task-Guided Pair Embedding in Heterogeneous Network , 2019, CIKM.

[147]  Jixing Xu,et al.  Gemini: A Novel and Universal Heterogeneous Graph Information Fusing Framework for Online Recommendations , 2020, KDD.

[148]  Philip S. Yu,et al.  Aspect-Level Deep Collaborative Filtering via Heterogeneous Information Networks , 2018, IJCAI.

[149]  Kevin Chen-Chuan Chang,et al.  Semantic Proximity Search on Heterogeneous Graph by Proximity Embedding , 2017, AAAI.

[150]  Minnan Luo,et al.  Self-Supervised Graph Representation Learning via Global Context Prediction , 2020, ArXiv.

[151]  Koki Tsuyuzaki,et al.  Biological Systems as Heterogeneous Information Networks: A Mini-review and Perspectives , 2017, ArXiv.

[152]  Philip S. Yu,et al.  Heterogeneous Information Network Embedding for Recommendation , 2017, IEEE Transactions on Knowledge and Data Engineering.

[153]  Jiawei Han,et al.  AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks , 2018, SDM.