Towards Scaling Fully Personalized PageRank

Personalized PageRank expresses backlink-based page quality around user-selected pages in a similar way as PageRank expresses quality over the entire Web. Existing personalized PageRank algorithms can however serve on-line queries only for a restricted choice of page selection. In this paper we achieve full personalization by a novel algorithm that computes a compact database of simulated random walks; this database can serve arbitrary personal choices of small subsets of web pages. We prove that for a fixed error probability, the size of our database is linear in the number of web pages. We justify our estimation approach by asymptotic worst-case lower bounds; we show that exact personalized PageRank values can only be obtained from a database of quadratic size.

[1]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[2]  Herwig Unger,et al.  Innovative Internet Community Systems, 4th InternationalWorkshop, IICS 2004, Guadalajara, Mexico, June 21-23, 2004, Revised Papers , 2006, IICS.

[3]  Christos Faloutsos,et al.  ANF: a fast and scalable tool for data mining in massive graphs , 2002, KDD.

[4]  Shlomo Moran,et al.  Rank-Stability and Rank-Similarity of Link-Based Web Ranking Algorithms in Authority-Connected Graphs , 2005, Information Retrieval.

[5]  Sepandar D. Kamvar,et al.  An Analytical Comparison of Approaches to Personalizing PageRank , 2003 .

[6]  Eyal Kushilevitz,et al.  Communication Complexity , 1997, Adv. Comput..

[7]  Dániel Fogaras Where to Start Browsing the Web? , 2003, IICS.

[8]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .

[9]  Marc Najork,et al.  On near-uniform URL sampling , 2000, Comput. Networks.

[10]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[11]  Torsten Suel,et al.  I/O-efficient techniques for computing pagerank , 2002, CIKM '02.

[12]  Marc Najork,et al.  Measuring Index Quality Using Random Walks on the Web , 1999, Comput. Networks.

[13]  Steve Chien,et al.  Approximating Aggregate Queries about Web Pages via Random Walks , 2000, VLDB.

[14]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[15]  David M. Pennock,et al.  Methods for Sampling Pages Uniformly from the World Wide Web , 2001 .

[16]  Allan Borodin,et al.  Finding authorities and hubs from link structures on the World Wide Web , 2001, WWW '01.

[17]  Wolfgang Nejdl,et al.  PROS: A Personalized Ranking Platform for Web Search , 2004, AH.

[18]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[19]  Edith Cohen,et al.  Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..

[20]  Andrei Z. Broder,et al.  Sic transit gloria telae: towards an understanding of the web's decay , 2004, WWW '04.

[21]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[22]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[23]  Dániel Fogaras,et al.  A Scalable Randomized Method to Compute Link-Based Similarity Rank on the Web Graph , 2004, EDBT Workshops.

[24]  Kevin S. McCurley,et al.  Locality, Hierarchy, and Bidirectionality in the Web∗ , 2003 .

[25]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[26]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.