CoarSAS2hvec: Heterogeneous Information Network Embedding with Balanced Network Sampling

Heterogeneous information network (HIN) embedding aims to find the representations of nodes that preserve the proximity between entities of different nature. A family of approaches that are wildly adopted applies random walk to generate a sequence of heterogeneous context, from which the embedding is learned. However, due to the multipartite graph structure of HIN, hub nodes tend to be overrepresented in the sampled sequence, giving rise to imbalanced samples of the network. Here we propose a new embedding method CoarSAS2hvec. The self-avoid short sequence sampling with the HIN coarsening procedure (CoarSAS) are utilized to better collect the rich information in HIN. An optimized loss function is used to improve the performance of the HIN structure embedding. CoarSAS2hvec outperforms nine other methods in two different tasks on four real-world data sets. The ablation study confirms that the samples collected by CoarSAS contains richer information of the network compared with those by other methods, which is characterized by a higher information entropy. Hence, traditional loss function applied to samples by CoarSAS can also yield improved results. Our work addresses a limitation of the random-walk-based HIN embedding that has not been emphasized before, which can shed light on a range of problems in HIN analyses.

[1]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[2]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[3]  A. Rényi On Measures of Entropy and Information , 1961 .

[4]  Yizhou Sun,et al.  Mining heterogeneous information networks: a structural analysis approach , 2013, SKDD.

[5]  Yizhou Sun,et al.  Graph Regularized Transductive Classification on Heterogeneous Information Networks , 2010, ECML/PKDD.

[6]  Philippe Cudré-Mauroux,et al.  Are Meta-Paths Necessary?: Revisiting Heterogeneous Graph Embeddings , 2018, CIKM.

[7]  Hwanjo Yu,et al.  BHIN2vec: Balancing the Type of Relation in Heterogeneous Information Network , 2019, CIKM.

[8]  Linmei Hu,et al.  Relation Structure-Aware Heterogeneous Information Network Embedding , 2019, AAAI.

[9]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[10]  Wang-Chien Lee,et al.  HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning , 2017, CIKM.

[11]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[12]  Philip S. Yu,et al.  HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[13]  Xiaomeng Wang,et al.  Independent Asymmetric Embedding Model for Cascade Prediction on Social Network , 2021, ArXiv.

[14]  Hao Wang,et al.  PME: Projected Metric Embedding on Heterogeneous Networks for Link Prediction , 2018, KDD.

[15]  Sylvain Lamprier,et al.  Representation Learning for Information Diffusion through Social Networks: an Embedded Cascade Model , 2016, WSDM.

[16]  Zhiyuan Liu,et al.  Representation Learning for Measuring Entity Relatedness with Rich Information , 2015, IJCAI.

[17]  Xiaomeng Wang,et al.  From Syntactic Structure to Semantic Relationship: Hypernym Extraction from Definitions by Recurrent Neural Networks Using the Part of Speech Information , 2020, SEMWEB.

[18]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[19]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[20]  Yanfang Ye,et al.  Heterogeneous Graph Attention Network , 2019, WWW.

[21]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[22]  Nitesh V. Chawla,et al.  Heterogeneous Graph Neural Network , 2019, KDD.

[23]  Yu He,et al.  HeteSpaceyWalk: A Heterogeneous Spacey Random Walk for Heterogeneous Information Network Embedding , 2019, CIKM.

[24]  Yanfang Ye,et al.  Network Schema Preserving Heterogeneous Information Network Embedding , 2020, IJCAI.

[25]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[26]  Boleslaw K. Szymanski,et al.  Quantifying patterns of research-interest evolution , 2017, Nature Human Behaviour.

[27]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[28]  Charu C. Aggarwal,et al.  Heterogeneous Network Embedding via Deep Architectures , 2015, KDD.

[29]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[30]  G. Crooks On Measures of Entropy and Information , 2015 .

[31]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Philip S. Yu,et al.  Embedding of Embedding (EOE): Joint Embedding for Coupled Heterogeneous Networks , 2017, WSDM.

[33]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[34]  Jian Pei,et al.  A Survey on Network Embedding , 2017, IEEE Transactions on Knowledge and Data Engineering.

[35]  Pietro Liò,et al.  Deep Graph Infomax , 2018, ICLR.

[36]  Carl Yang,et al.  Heterogeneous Network Representation Learning: A Unified Framework With Survey and Benchmark , 2020, IEEE Transactions on Knowledge and Data Engineering.

[37]  Yong-Yeol Ahn,et al.  Neural embeddings of scholarly periodicals reveal complex disciplinary organizations , 2021, Science Advances.

[38]  Sriram Vajapeyam,et al.  Understanding Shannon's Entropy metric for Information , 2014, ArXiv.

[39]  Chuan Shi,et al.  Adversarial Learning on Heterogeneous Information Networks , 2019, KDD.

[40]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[41]  Xiao Wang,et al.  Hyperbolic Heterogeneous Information Network Embedding , 2019, AAAI.