MetaGraph2Vec: Complex Semantic Path Augmented Heterogeneous Network Embedding

Network embedding in heterogeneous information networks (HINs) is a challenging task, due to complications of different node types and rich relationships between nodes. As a result, conventional network embedding techniques cannot work on such HINs. Recently, metapath-based approaches have been proposed to characterize relationships in HINs, but they are ineffective in capturing rich contexts and semantics between nodes for embedding learning, mainly because (1) metapath is a rather strict single path node-node relationship descriptor, which is unable to accommodate variance in relationships, and (2) only a small portion of paths can match the metapath, resulting in sparse context information for embedding learning. In this paper, we advocate a new metagraph concept to capture richer structural contexts and semantics between distant nodes. A metagraph contains multiple paths between nodes, each describing one type of relationships, so the augmentation of multiple metapaths provides an effective way to capture rich contexts and semantic relations between nodes. This greatly boosts the ability of metapath-based embedding techniques in handling very sparse HINs. We propose a new embedding learning algorithm, namely MetaGraph2Vec, which uses metagraph to guide the generation of random walks and to learn latent embeddings of multi-typed HIN nodes. Experimental results show that MetaGraph2Vec is able to outperform the state-of-the-art baselines in various heterogeneous network mining tasks such as node classification, node clustering, and similarity search.

[1]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[2]  Chengqi Zhang,et al.  Network Representation Learning: A Survey , 2017, IEEE Transactions on Big Data.

[3]  Yizhou Sun,et al.  Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification , 2016, WSDM.

[4]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[5]  Xiang Li,et al.  Meta Structure: Computing Relevance in Large Heterogeneous Information Networks , 2016, KDD.

[6]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[7]  Chengqi Zhang,et al.  User Profile Preserving Social Network Embedding , 2017, IJCAI.

[8]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[9]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[10]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[11]  Chengqi Zhang,et al.  Homophily, Structure, and Content Augmented Network Representation Learning , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[12]  Philip S. Yu,et al.  Semantic Path based Personalized Recommendation on Weighted Heterogeneous Information Networks , 2015, CIKM.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Laks V. S. Lakshmanan,et al.  HeteroMF: recommendation in heterogeneous information networks using context dependent factor models , 2013, WWW.

[15]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[16]  Philip S. Yu,et al.  Integrating meta-path selection with user-guided object clustering in heterogeneous information networks , 2012, KDD.