Neural PathSim for Inductive Similarity Search in Heterogeneous Information Networks

PathSim is a widely used meta-path-based similarity in heterogeneous information networks. Numerous applications rely on the computation of PathSim, including similarity search and clustering. Computing PathSim scores on large graphs is computationally challenging due to its high time and storage complexity. In this paper, we propose to transform the problem of approximating the ground truth PathSim scores into a learning problem. We design an encoder-decoder based framework, NeuPath, where the algorithmic structure of PathSim is considered. Specifically, the encoder module identifies Top T optimized path instances, which can approximate the ground truth PathSim, and maps each path instance to an embedding vector. The decoder transforms each embedding vector into a scalar respectively, which identifies the similarity score. We perform extensive experiments on two real-world datasets in different domains, ACM and IMDB. Our results demonstrate that NeuPath performs better than state-of-the-art baselines in the PathSim approximation task and similarity search task.

[1]  Philip S. Yu,et al.  Integrating meta-path selection with user-guided object clustering in heterogeneous information networks , 2012, KDD.

[2]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[3]  Dik Lun Lee,et al.  Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks , 2017, KDD.

[4]  Kevin Chen-Chuan Chang,et al.  Semantic proximity search on graphs with metagraph-based learning , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[5]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[6]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[7]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[8]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[9]  Yu He,et al.  HeteSpaceyWalk: A Heterogeneous Spacey Random Walk for Heterogeneous Information Network Embedding , 2019, CIKM.

[10]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[11]  Philip S. Yu,et al.  HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[12]  Xin Jiang,et al.  Neural Subgraph Isomorphism Counting , 2020, KDD.

[13]  Ni Lao,et al.  Fast query execution for retrieval models based on path-constrained random walks , 2010, KDD.

[14]  FoussFrancois,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007 .

[15]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[16]  Majid Sarrafzadeh,et al.  HeteroMed: Heterogeneous Information Network for Medical Diagnosis , 2018, CIKM.

[17]  Yizhou Sun,et al.  Personalized entity recommendation: a heterogeneous information network approach , 2014, WSDM.

[18]  Yanfang Ye,et al.  Heterogeneous Graph Attention Network , 2019, WWW.

[19]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[20]  Yizhou Sun,et al.  Learning to Identify High Betweenness Centrality Nodes from Scratch: A Novel Graph Neural Network Approach , 2019, CIKM.

[21]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[22]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[23]  Jiawei Han,et al.  KnowSim: A Document Similarity Measure on Structured Heterogeneous Information Networks , 2015, 2015 IEEE International Conference on Data Mining.

[24]  Jure Leskovec,et al.  Neural Subgraph Matching , 2020, ArXiv.

[25]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[26]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.

[27]  Philip S. Yu,et al.  Heterogeneous Information Network Embedding for Recommendation , 2017, IEEE Transactions on Knowledge and Data Engineering.

[28]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[29]  Philip S. Yu,et al.  PathSim , 2011 .

[30]  Raia Hadsell,et al.  Neural Execution of Graph Algorithms , 2020, ICLR.

[31]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[32]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[33]  Phuc Do,et al.  DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network* , 2018, J. Inf. Telecommun..

[34]  Yizhou Sun,et al.  Heterogeneous Graph Transformer , 2020, WWW.

[35]  Reynold Cheng,et al.  Discovering Meta-Paths in Large Heterogeneous Information Networks , 2015, WWW.

[36]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[37]  Ken-ichi Kawarabayashi,et al.  What Can Neural Networks Reason About? , 2019, ICLR.

[38]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[39]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[40]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[41]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[42]  Jiawei Han,et al.  Text Classification with Heterogeneous Information Network Kernels , 2016, AAAI.

[43]  Yanfang Ye,et al.  HinDroid: An Intelligent Android Malware Detection System Based on Structured Heterogeneous Information Network , 2017, KDD.

[44]  Jaewoo Kang,et al.  Graph Transformer Networks , 2019, NeurIPS.

[45]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[46]  J. Leskovec,et al.  Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning , 2020, NeurIPS.