Experimental Bounds on the Usefulness of Personalized and Topic-Sensitive PageRank

PageRank is an algorithm used by several search engines to rank web documents according to their assumed relevance and popularity deduced from the Web's link structure. PageRank determines a global ordering of candidate search results according to each page's popularity as determined by the number and importance of pages linking to these results. Personalized and topic-sensitive PageRank are variants of the algorithm that return a local ranking based on each user's preferences as biased by a set of pages they trust or topics they prefer. In this paper we compare personalized and topic-sensitive local PageRanks to the global PageRank showing experimentally how similar or dissimilar results of personalization can be to the original global rank results and to other personalizations. Our approach is to examine a snapshot of the Web and determine how advantageous personalization can be in the best and worst cases and how it performs at various values of the damping factor in the PageRank formula.

[1]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[2]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[3]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[4]  Ellen M. Voorhees,et al.  Using Grammatical Relations , 2001 .

[5]  Valentin Jijkoun,et al.  Retrieving answers from frequently asked questions pages on the web , 2005, CIKM '05.

[6]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[7]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[8]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[9]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[10]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank , 2004, WAW.

[11]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .

[12]  Pushpak Bhattacharyya,et al.  Is question answering an acquired skill? , 2004, WWW '04.

[13]  Oren Etzioni,et al.  Structured Querying of Web Text A Technical Challenge , 2006 .

[14]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[15]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.

[16]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[17]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[18]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[19]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[20]  Sabine Buchholz,et al.  Using Grammatical Relations, Answer Frequencies and the World Wide Web for TREC Question Answering , 2001, TREC.

[21]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[22]  Susan T. Dumais,et al.  An Analysis of the AskMSR Question-Answering System , 2002, EMNLP.

[23]  Alon Y. Halevy,et al.  Semantic Integration Research in the Database Community : A Brief Survey , 2005 .

[24]  Taher H. Haveliwala,et al.  The Condition Number of the PageRank Problem , 2003 .

[25]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.