Path Sampling Based Relevance Search in Heterogeneous Networks

With the boom of study on heterogeneous network, searching relevant objects of different types has become a research focus. For example, people are interested in finding actors who cooperate with the famous director Steven Spielberg the most frequently in movie network. Considering the time and memory consuming drawbacks of traditional random walk models, this paper presents a random path sampling measure RSSim, where the tradeoff can be made between efficiency and estimating accuracy, to discover relevant objects in heterogeneous network. The key idea of this algorithm is that we use a Monte Carlo simulation to make an \(\varepsilon \)-approximation to our relevance measure defined on meta path, an important concept to catch up the semantic meaning of a search. The lightweight property and quickness of Monte Carlo simulation make the algorithm applicable to large scale networks. Moreover, we give the theoretical proofs for the error bound and confidence followed in the process of estimation. Experiments validate that RSSim is 100 times faster than several optional methods and can make a good ranking accuracy approximation to the baseline with a small sample size.

[1]  Bin Wu,et al.  Relevance Measure in Large-Scale Heterogeneous Networks , 2014, APWeb.

[2]  Philip S. Yu,et al.  HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[3]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[4]  Reynold Cheng,et al.  Walking in the Cloud: Parallel SimRank at Scale , 2015, Proc. VLDB Endow..

[5]  Ken-ichi Kawarabayashi,et al.  Scalable similarity search for SimRank , 2014, SIGMOD Conference.

[6]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[7]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[8]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[9]  Xing Xie,et al.  An Efficient Similarity Search Framework for SimRank over Large Dynamic Graphs , 2015, Proc. VLDB Endow..

[10]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank , 2004, WAW.

[11]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[12]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[13]  Ni Lao,et al.  Fast query execution for retrieval models based on path-constrained random walks , 2010, KDD.

[14]  Hanghang Tong,et al.  Panther: Fast Top-k Similarity Search on Large Networks , 2015, KDD.

[15]  Philip S. Yu,et al.  Relevance search in heterogeneous networks , 2012, EDBT '12.

[16]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.