Local computation of PageRank: the ranking side

Imagine you are a social network user who wants to search, in a list of potential candidates, for the best candidate for a job on the basis of their PageRank-induced importance ranking. Is it possible to compute this ranking for a low cost, by visiting only small subnetworks around the nodes that represent each candidate? The fundamental problem underpinning this question, i.e. computing locally the PageRank ranking of k nodes in an $n$-node graph, was first raised by Chen et al. (CIKM 2004) and then restated by Bar-Yossef and Mashiach (CIKM 2008). In this paper we formalize and provide the first analysis of the problem, proving that any local algorithm that computes a correct ranking must take into consideration Ω(√(kn)) nodes -- even when ranking the top $k$ nodes of the graph, even if their PageRank scores are "well separated", and even if the algorithm is randomized (and we prove a stronger Ω(n) bound for deterministic algorithms). Experiments carried out on large, publicly available crawls of the web and of a social network show that also in practice the fraction of the graph to be visited to compute the ranking may be considerable, both for algorithms that are always correct and for algorithms that employ (efficient) local score approximations.

[1]  Rada Mihalcea,et al.  Semantic document engineering with WordNet and PageRank , 2005, SAC '05.

[2]  Desmond J. Higham,et al.  GeneRank: Using search engine technology for the analysis of microarray experiments , 2005, BMC Bioinformatics.

[3]  Enoch Peserico,et al.  Choose the Damping, Choose the Ranking? , 2009, WAW.

[4]  Gianna M. Del Corso,et al.  Fast PageRank Computation via a Sparse Linear System , 2005, Internet Math..

[5]  Hector Garcia-Molina,et al.  Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[6]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[7]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[8]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[9]  H. Buhrman,et al.  Complexity measures and decision tree complexity: a survey , 2002, Theor. Comput. Sci..

[10]  Ziv Bar-Yossef,et al.  Local approximation of PageRank and reverse PageRank , 2008, SIGIR '08.

[11]  Massimo Melucci,et al.  PageRank: When Order Changes , 2007, ECIR.

[12]  Allan Borodin,et al.  Link analysis ranking: algorithms, theory, and experiments , 2005, TOIT.

[13]  Heikki Mannila,et al.  Relational link-based ranking , 2004, VLDB.

[14]  Andrea Esuli,et al.  PageRanking WordNet Synsets: An Application to Opinion Mining , 2007, ACL.

[15]  Oren Kurland,et al.  PageRank without hyperlinks: structural re-ranking using links induced by language models , 2005, SIGIR '05.

[16]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[17]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[18]  Vahab S. Mirrokni,et al.  Local Computation of PageRank Contributions , 2007, Internet Math..

[19]  Alessandro Acquisti,et al.  Information revelation and privacy in online social networks , 2005, WPES '05.

[20]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach (4. ed.) , 2007 .

[21]  Florian Probst,et al.  Identifying Key Users in Online Social Networks: A PageRank Based Approach , 2010, ICIS.

[22]  Shlomo Moran,et al.  Rank-Stability and Rank-Similarity of Link-Based Web Ranking Algorithms in Authority-Connected Graphs , 2005, Information Retrieval.

[23]  Sebastiano Vigna,et al.  UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..

[24]  Ravi Kumar,et al.  Self-similarity in the web , 2001, TOIT.

[25]  Hector Garcia-Molina,et al.  The Eigentrust algorithm for reputation management in P2P networks , 2003, WWW '03.

[26]  Torsten Suel,et al.  Local methods for estimating pagerank values , 2004, CIKM '04.

[27]  Dongwon Lee,et al.  Toward alternative measures for ranking venues: a case of database research community , 2007, JCDL '07.

[28]  Loren G. Terveen,et al.  Does “authority” mean quality? predicting expert quality ratings of Web documents , 2000, SIGIR '00.

[29]  Ilse C. F. Ipsen,et al.  Ordinal Ranking for Google's PageRank , 2008, SIAM J. Matrix Anal. Appl..

[30]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[31]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[32]  Andrew Chi-Chih Yao,et al.  Probabilistic computations: Toward a unified measure of complexity , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[33]  Sebastiano Vigna,et al.  Traps and Pitfalls of Topic-Biased PageRank , 2007, WAW.