Fast top-k path-based relevance query on massive graphs

The task of obtaining the items highly-relevant to a given set of query items is a basis for various applications, such as recommendation and prediction. A family of path-based relevance metrics, which quantify item relevance based on the paths in a given item graph, have been shown to be effective in capturing the relevance in many applications. Despite their effectiveness, path-based relevance normally requires time-consuming iterative computation. We propose an approach to obtain the top-k most relevant items for a given query item set quickly. Our approach can obtain the top-k items without having to compute converged scores. The approach is designed for a distributed environment, which makes it scale for massive graphs having hundreds of millions of nodes. Our experimental results show that the proposed approach can produce the result 20 to 50 times faster than a previously proposed approach and can scale well with both the size of input and the number of machines used in the computation.

[1]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[2]  Shankar Kumar,et al.  Video suggestion and discovery for youtube: taking random walks through the view graph , 2008, WWW.

[3]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[4]  Hector Garcia-Molina,et al.  Link spam detection based on mass estimation , 2006, VLDB.

[5]  András A. Benczúr,et al.  SpamRank - fully automatic link spam detection. Work in progress , 2005 .

[6]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments , 2005, Internet Math..

[7]  Ashish Goel,et al.  Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..

[8]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[9]  Takuya Akiba,et al.  Computing Personalized PageRank Quickly by Exploiting Graph Structures , 2014, Proc. VLDB Endow..

[10]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[11]  Lixin Gao,et al.  Maiter : A Message-Passing Distributed Framework for Accumulative Iterative Computation , 2012 .

[12]  Yasuhiro Fujiwara,et al.  Fast and Exact Top-k Search for Random Walk with Restart , 2012, Proc. VLDB Endow..

[13]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[14]  Soumen Chakrabarti,et al.  Fast algorithms for topk personalized pagerank queries , 2008, WWW.

[15]  Lixin Gao,et al.  Maiter: An Asynchronous Graph Processing Framework for Delta-based Accumulative Iterative Computation , 2017, 1710.05785.

[16]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[17]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[18]  Yasuhiro Fujiwara,et al.  Efficient ad-hoc search for personalized PageRank , 2013, SIGMOD '13.

[19]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[20]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[21]  Ioannis Konstas,et al.  On social networks and collaborative recommendation , 2009, SIGIR.

[22]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[23]  Vahab S. Mirrokni,et al.  Local Computation of PageRank Contributions , 2007, Internet Math..

[24]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[25]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[26]  Yanfeng Zhang,et al.  PrIter: A Distributed Framework for Prioritizing Iterative Computations , 2011, IEEE Transactions on Parallel and Distributed Systems.

[27]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[28]  Kevin Chen-Chuan Chang,et al.  RoundTripRank: Graph-based proximity with importance and specificity? , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[29]  L. Takac DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS , 2012 .

[30]  Marco Gori,et al.  ItemRank: A Random-Walk Based Scoring Algorithm for Recommender Engines , 2007, IJCAI.

[31]  Yasuhiro Fujiwara,et al.  Efficient personalized pagerank with accuracy assurance , 2012, KDD.

[32]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[33]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[34]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[35]  Lixin Gao,et al.  Fast Top-K Path-Based Relevance Query on Massive Graphs , 2016, IEEE Transactions on Knowledge and Data Engineering.

[36]  Pavel Berkhin,et al.  Bookmark-Coloring Algorithm for Personalized PageRank Computing , 2006, Internet Math..

[37]  Indranil Gupta,et al.  Delta-SimRank computing on MapReduce , 2012, BigMine '12.

[38]  Lixin Gao,et al.  Accelerate large-scale iterative computation through asynchronous accumulative updates , 2012, ScienceCloud '12.

[39]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[40]  Jiawei Han,et al.  A Unified Framework for Link Recommendation Using Random Walks , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[41]  Qing Zhang,et al.  Assessing and ranking structural correlations in graphs , 2011, SIGMOD '11.