Reducing seed noise in personalized PageRank

Network-based recommendation systems leverage the topology of the underlying graph and the current user context to rank objects in the database. Random walk-based techniques, such as PageRank, encode the structure of the graph in the form of a transition matrix of a stochastic process from which the significances of the nodes in the graph are inferred. Personalized PageRank (PPR) techniques complement this with a seed node set which serves as the personalization context. In this paper, we note (and experimentally show) that PPR algorithms that do not differentiate among the seed nodes may not properly rank nodes in situations where the seed set is incomplete and/or noisy. To tackle this problem, we propose alternative robust personalized PageRank (RPR) strategies, which are insensitive to noise in the set of seed nodes and in which the rankings are not overly biased towards the seed nodes. In particular, we show that novel teleportation-discounting and seed-set maximal PPR techniques help eliminate harmful bias of individual seed nodes and provide effective seed differentiation to lead to more accurate rankings. We also show that the proposed techniques lead to efficient implementations, where existing approximation algorithms and/or parallel implementations for computing the PPR scores can be easily leveraged. Moreover, the proposed formulations are reuse-promoting in the sense that, it is possible to divide the work relative to individual seed nodes and cache the intermediary results obtained during the computation, and especially in systems with large query throughputs, it may be possible to cluster queries based on the partial overlaps between the seed sets and reduce the overall robust PPR computation costs. Experiment results show that the proposed techniques are efficient and highly effective in improving recommendations and eliminating unwanted bias due to imperfections in the seed set.

[1]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[2]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[3]  Fang Wei-Kleiner TEDI: Efficient Shortest Path Query Answering on Graphs , 2011, Graph Data Management.

[4]  Yasuhiro Fujiwara,et al.  Fast and Exact Top-k Search for Random Walk with Restart , 2012, Proc. VLDB Endow..

[5]  K. Selçuk Candan,et al.  Reasoning for Web document associations and its applications in site map construction , 2002, Data Knowl. Eng..

[6]  K. Selçuk Candan,et al.  LR-PPR: locality-sensitive, re-use promoting, approximate personalized pagerank computation , 2013, CIKM.

[7]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[8]  Kurt C. Foster,et al.  A Faster Katz Status Score Algorithm , 2001, Comput. Math. Organ. Theory.

[9]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10]  Atish Das Sarma,et al.  Fast Distributed PageRank Computation , 2013, ICDCN.

[11]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[12]  Shang-Hua Teng,et al.  Multiscale Matrix Sampling and Sublinear-Time PageRank Computation , 2012, Internet Math..

[13]  Dong Xin,et al.  Fast personalized PageRank on MapReduce , 2011, SIGMOD '11.

[14]  K. Selçuk Candan,et al.  Using Random Walks for Mining Web Document Associations , 2000, PAKDD.

[15]  Soumen Chakrabarti,et al.  Fast algorithms for topk personalized pagerank queries , 2008, WWW.

[16]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments , 2005, Internet Math..

[17]  Adam Tauman Kalai,et al.  Trust-based recommendation systems: an axiomatic approach , 2008, WWW.

[18]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[19]  S. Borgatti,et al.  Betweenness centrality measures for directed graphs , 1994 .

[20]  Chun Chen,et al.  Personalized tag recommendation using graph-based ranking on multi-type interrelated objects , 2009, SIGIR.

[21]  Christos Faloutsos,et al.  Fast direction-aware proximity for graph mining , 2007, KDD '07.

[22]  George Casella,et al.  Erratum: Inverting a Sum of Matrices , 1990, SIAM Rev..

[23]  Timothy A. Davis,et al.  Direct methods for sparse linear systems , 2006, Fundamentals of algorithms.

[24]  Soumen Chakrabarti,et al.  Dynamic personalized pagerank in entity-relation graphs , 2007, WWW '07.

[25]  Pavel Berkhin,et al.  Bookmark-Coloring Algorithm for Personalized PageRank Computing , 2006, Internet Math..

[26]  Christos Faloutsos,et al.  ANF: a fast and scalable tool for data mining in massive graphs , 2002, KDD.

[27]  Setsuo Ohsuga,et al.  INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES , 1977 .

[28]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[29]  Ashish Goel,et al.  Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..

[30]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[31]  Purnamrita Sarkar,et al.  Fast incremental proximity search in large graphs , 2008, ICML '08.

[32]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[33]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[34]  Takuya Akiba,et al.  Computing Personalized PageRank Quickly by Exploiting Graph Structures , 2014, Proc. VLDB Endow..

[35]  Amy Nicole Langville,et al.  Updating pagerank with iterative aggregation , 2004, WWW Alt. '04.

[36]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[37]  Lei Zou,et al.  DistanceJoin: Pattern Match Query In a Large Graph Database , 2009, Proc. VLDB Endow..

[38]  James T. Halbert,et al.  Scalable graph clustering with parallel approximate PageRank , 2014, Social Network Analysis and Mining.

[39]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[40]  Jian Pei,et al.  Efficiently indexing shortest paths by exploiting symmetry in graphs , 2009, EDBT '09.

[41]  K. Avrachenkov,et al.  Quick Detection of Top-k Personalized PageRank Lists , 2011, WAW.

[42]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[43]  S. Borgatti,et al.  Network Measures of Social Capital , 2012 .

[44]  Marco Rosa,et al.  HyperANF: approximating the neighbourhood function of very large graphs on a budget , 2010, WWW.

[45]  Mo Chen,et al.  Clustering via Random Walk Hitting Time on Directed Graphs , 2008, AAAI.

[46]  Abdulmotaleb El-Saddik,et al.  Personalized PageRank vectors for tag recommendations: inside FolkRank , 2011, RecSys '11.