Traps and Pitfalls of Topic-Biased PageRank

We discuss a number of issues in the definition, computation and comparison of PageRank values that have been addressed sparsely in the literature, often with contradictory approaches. We study the difference between weaklyand stronglypreferential PageRank, which patch the dangling nodes with different distributions, extending analytical formulae known for the strongly preferential case, and corroborating our results with experiments on a snapshot of 100 millions of pages of the .uk domain. The experiments show that the two PageRank versions are poorly correlated, and results about each one cannot be blindly applied to the other; moreover, our computations highlight some new concerns about the usage of exchange-based correlation indices (such as Kendall's ?) on approximated rankings.

[1]  Sebastiano Vigna,et al.  Graph fibrations, graph isomorphism, and PageRank , 2006, RAIRO Theor. Informatics Appl..

[2]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[3]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[4]  Sebastiano Vigna,et al.  Paradoxical Effects in PageRank Incremental Computations , 2005, Internet Math..

[5]  Gianna M. Del Corso,et al.  Fast PageRank Computation via a Sparse Linear System , 2005, Internet Math..

[6]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[7]  Ozalp Babaoglu,et al.  DELIS: Dynamically Evolving, Large Scale Information Systems , 2004 .

[8]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments , 2005, Internet Math..

[9]  Jean B. Lasserre A formula for singular perturbations of Markov chains , 1994 .

[10]  Ronald Fagin,et al.  Searching the workplace web , 2003, WWW '03.

[11]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[12]  Sebastiano Vigna,et al.  Do Your Worst to Make the Best: Paradoxical Effects in PageRank Incremental Computations , 2004, WAW.

[13]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[14]  Kevin S. McCurley,et al.  Ranking the web frontier , 2004, WWW '04.

[15]  Marius Iosifescu,et al.  Finite Markov Processes and Their Applications , 1981 .

[16]  Even Flood,et al.  ODP, Open Directory Project , 2005 .

[17]  Sebastiano Vigna,et al.  UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..

[18]  Wenfei Fan,et al.  Keys for XML , 2001, WWW '01.