PReP: Path-Based Relevance from a Probabilistic Perspective in Heterogeneous Information Networks

As a powerful representation paradigm for networked and multi-typed data, the heterogeneous information network (HIN) is ubiquitous. Meanwhile, defining proper relevance measures has always been a fundamental problem and of great pragmatic importance for network mining tasks. Inspired by our probabilistic interpretation of existing path-based relevance measures, we propose to study HIN relevance from a probabilistic perspective. We also identify, from real-world data, and propose to model cross-meta-path synergy, which is a characteristic important for defining path-based HIN relevance and has not been modeled by existing methods. A generative model is established to derive a novel path-based relevance measure, which is data-driven and tailored for each HIN. We develop an inference algorithm to find the maximum a posteriori (MAP) estimate of the model parameters, which entails non-trivial tricks. Experiments on two real-world datasets demonstrate the effectiveness of the proposed model and relevance measure.

[1]  Yizhou Sun,et al.  Mining heterogeneous information networks: a structural analysis approach , 2013, SKDD.

[2]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[3]  Kevin Chen-Chuan Chang,et al.  Semantic proximity search on graphs with metagraph-based learning , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[4]  Jiawei Han,et al.  Query-Based Outlier Detection in Heterogeneous Information Networks , 2015, EDBT.

[5]  Souvik Ghosh,et al.  Dynamics of Large Multi-View Social Networks: Synergy, Cannibalization and Cross-View Interplay , 2016, KDD.

[6]  Rajat Raina,et al.  Learning relevance from heterogeneous social network and its application in online targeting , 2011, SIGIR.

[7]  Yizhou Sun,et al.  Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification , 2016, WSDM.

[8]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[9]  Yizhou Sun,et al.  Personalized entity recommendation: a heterogeneous information network approach , 2014, WSDM.

[10]  U LeongHou,et al.  PathSimExt: Revisiting PathSim in Heterogeneous Information Networks , 2014, WAIM.

[11]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[12]  Philip S. Yu,et al.  HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[13]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[14]  Yizhou Sun,et al.  User guided entity similarity search using meta-path selection in heterogeneous information networks , 2012, CIKM.

[15]  James Bailey,et al.  Exploiting Transitive Similarity and Temporal Dynamics for Similarity Search in Heterogeneous Information Networks , 2014, DASFAA.

[16]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[17]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[18]  Philip S. Yu,et al.  Top-k Similarity Join in Heterogeneous Information Networks , 2015, IEEE Transactions on Knowledge and Data Engineering.

[19]  Jiawei Han,et al.  Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks , 2014, 2014 IEEE International Conference on Data Mining.

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  Stanford,et al.  Learning to Discover Social Circles in Ego Networks , 2012 .

[22]  Xiang Li,et al.  Meta Structure: Computing Relevance in Large Heterogeneous Information Networks , 2016, KDD.

[23]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.