WMPEClus: Clustering via Weighted Meta-Path Embedding for Heterogeneous Information Networks

A low-dimensional embedding of multiple nodes is great convenient for clustering, which is one of the most fundamental tasks for heterogeneous information networks (HINs). In the meantime, the random walk-based network embedding is proved to be equivalent to the method of matrix factorization whose computational cost is very expensive. Moreover, mapping different types of nodes into one metric space may result in incompatibility. To cope with the two challenges above, a weighted meta-path embedding based clustering method (called WMPEClus) is proposed in this paper. On the one hand, in order to solve the incompatibility problem, the original network is transformed into several subnetworks with independent semantics specified by meta-paths which are automatically generated by our method. On the other hand, an approximate commute embedding approach, avoiding eigen-decomposition to reduce computational cost, is leveraged to the representation learning of the nodes in each subnetwork. At last, a unified probabilistic generation model is designed to aggregate the vectorized representations learned in different metric spaces for clustering. Experiment results show that WMPEClus is effective in HIN clustering and outperforms the state-of-the-art baselines on two real-world datasets.

[1]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Nikhil Srivastava,et al.  Graph sparsification by effective resistances , 2008, SIAM J. Comput..

[3]  Bo Zhao,et al.  Collective topic modeling for heterogeneous networks , 2011, SIGIR '11.

[4]  Yizhou Sun,et al.  User guided entity similarity search using meta-path selection in heterogeneous information networks , 2012, CIKM.

[5]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[6]  Philip S. Yu,et al.  HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[7]  Chuan Shi,et al.  Adversarial Learning on Heterogeneous Information Networks , 2019, KDD.

[8]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[9]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[10]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[11]  Wang-Chien Lee,et al.  HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning , 2017, CIKM.

[12]  Philip S. Yu,et al.  Ranking-based Clustering on General Heterogeneous Information Networks by Network Projection , 2014, CIKM.

[13]  Chen Luo,et al.  Semi-supervised Clustering on Heterogeneous Information Networks , 2014, PAKDD.

[14]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[15]  Jiawei Han,et al.  Community Distribution Outlier Detection in Heterogeneous Information Networks , 2013, ECML/PKDD.

[16]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[17]  Jiawei Han,et al.  Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks , 2018, KDD.

[18]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[19]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[20]  Nguyen Lu Dang Khoa,et al.  Large Scale Spectral Clustering Using Resistance Distance and Spielman-Teng Solvers , 2012, Discovery Science.

[21]  Philip S. Yu,et al.  Integrating Clustering and Ranking on Hybrid Heterogeneous Information Network , 2013, PAKDD.

[22]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[23]  Yanfang Ye,et al.  Heterogeneous Graph Attention Network , 2019, WWW.

[24]  Yizhou Sun,et al.  RankClus: integrating clustering with ranking for heterogeneous information network analysis , 2009, EDBT '09.

[25]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[26]  Yizhou Sun,et al.  Heterogeneous Graph Transformer , 2020, WWW.

[27]  Jian Pei,et al.  Community Preserving Network Embedding , 2017, AAAI.

[28]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[29]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.