Quick Detection of Top-k Personalized PageRank Lists

We study a problem of quick detection of top-k Personalized PageRank (PPR) lists. This problem has a number of important applications such as finding local cuts in large graphs, estimation of similarity distance and person name disambiguation. We argue that two observations are important when finding top-k PPR lists. Firstly, it is crucial that we detect fast the top-k most important neighbors of a node, while the exact order in the top-k list and the exact values of PPR are by far not so crucial. Secondly, by allowing a small number of "wrong" elements in top-k lists, we achieve great computational savings, in fact, without degrading the quality of the results. Based on these ideas, we propose Monte Carlo methods for quick detection of top-k PPR lists. We demonstrate the effectiveness of these methods on the Web and Wikipedia graphs, provide performance evaluation and supply stopping criteria.

[1]  Kunio Tanabe,et al.  An exact Cholesky decomposition and the generalized inverse of the variance-covariance matrix of the multinomial distribution, with applications , 1992 .

[2]  N. L. Johnson,et al.  Discrete Multivariate Distributions , 1998 .

[3]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[4]  L. A. BREYER,et al.  MARKOVIAN PAGE RANKING DISTRIBUTIONS: SOME THEORY AND SIMULATIONS , 2002 .

[5]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[6]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[7]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[8]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments , 2005, Internet Math..

[9]  Chadi Barakat,et al.  Ranking flows from sampled traffic , 2005, CoNEXT '05.

[10]  B. Nordstrom FINITE MARKOV CHAINS , 2005 .

[11]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[12]  Pavel Berkhin,et al.  Bookmark-Coloring Algorithm for Personalized PageRank Computing , 2006, Internet Math..

[13]  V. Zlatic,et al.  Wikipedias: collaborative web-based encyclopedias as complex networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[15]  Konstantin Avrachenkov,et al.  Monte Carlo Methods in PageRank Computation: When One Iteration is Sufficient , 2007, SIAM J. Numer. Anal..

[16]  J. Pitman,et al.  Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws ∗ , 2007, math/0701718.

[17]  Soumen Chakrabarti,et al.  Dynamic personalized pagerank in entity-relation graphs , 2007, WWW '07.

[18]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[19]  Konstantin Avrachenkov,et al.  Pagerank based clustering of hypertext document collections , 2008, SIGIR '08.

[20]  Shang-Hua Teng,et al.  Spectral affinity in protein networks , 2009, BMC Systems Biology.

[21]  Konstantin Avrachenkov,et al.  Monte Carlo Methods for Top-k Personalized PageRank Lists and Name Disambiguation , 2010, ArXiv.

[22]  Konstantin Avrachenkov,et al.  Using Web Graph Structure for Person Name Disambiguation , 2010, CLEF.

[23]  Ashish Goel,et al.  Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..