C-Rank and its variants: A contribution-based ranking approach exploiting links and content

This paper addresses the problem in Web page ranking of effectively combining link and content information with efficiency high enough to be applicable to real-world search engines. Unlike previous surfer models, our approach is based on the viewpoint of a Web page author. Based on this viewpoint, we formulate the concept of contribution score, which indicates the amount to which a term in each page is utilized by other pages. To improve efficiency without loss of effectiveness, we exploit the expectations of both a Web page author and a Web search engine user on retrieval results, and restrict candidate terms that can contribute to other pages to a set of keywords of each page. In this paper, we propose three contribution-based models: C-Rank, PC-Rank and HC-Rank. Experimental results show that C-Rank provides the best precision among the models and is very effective for topic distillation tasks on the .GOV collection in TREC. Most importantly, the proposed models are efficient enough to be applicable to real-world search engines.

[1]  Tao Qin,et al.  Microsoft Research Asia at Web Track and Terabyte Track of TREC 2004 , 2004, TREC.

[2]  David Hawking,et al.  Using anchor text for homepage and topic distillation search tasks , 2012, J. Assoc. Inf. Sci. Technol..

[3]  Andreas Rauber,et al.  On the relationship between query characteristics and IR functions retrieval bias , 2011, J. Assoc. Inf. Sci. Technol..

[4]  Craig MacDonald,et al.  The influence of the document ranking in expert search , 2011, Inf. Process. Manag..

[5]  J. A. Bondy,et al.  Graph Theory with Applications , 1978 .

[6]  Brian D. Davison Topical locality in the Web , 2000, SIGIR '00.

[7]  Ben He,et al.  Modeling term proximity for probabilistic information retrieval models , 2011, Inf. Sci..

[8]  Azadeh Shakery,et al.  Relevance Propagation for Topic Distillation UIUC TREC 2003 Web Track Experiments , 2003, TREC.

[9]  J. A. Bondy,et al.  Graph Theory with Applications , 1978 .

[10]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[11]  Jerry M. Mendel,et al.  A comparative study of ranking methods, similarity measures and uncertainty measures for interval type-2 fuzzy sets , 2009, Inf. Sci..

[12]  Xiaojun Wan,et al.  Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction , 2007, ACL.

[13]  Jiawei Han,et al.  Document-topic hierarchies from document graphs , 2012, CIKM.

[14]  Tao Qin,et al.  A study of relevance propagation for web search , 2005, SIGIR '05.

[15]  David Hawking,et al.  Query-independent evidence in home page finding , 2003, TOIS.

[16]  Scott Counts,et al.  Identifying topical authorities in microblogs , 2011, WSDM '11.

[17]  Falk Scholer,et al.  Topic Distillation with Query-Dependent Link Connections and Page Characteristics , 2011, TWEB.

[18]  Seung-won Hwang,et al.  Search structures and algorithms for personalized ranking , 2008, Inf. Sci..

[19]  Brian D. Davison,et al.  Separate and inequal: preserving heterogeneity in topical authority flows , 2008, SIGIR '08.

[20]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[21]  Bich-Liên Doan,et al.  Relevance Propagation Model for Large Hypertext Documents Collections , 2007, RIAO.

[22]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[23]  Ioannis Pitas,et al.  Combining text and link analysis for focused crawling - An application for vertical search engines , 2007, Inf. Syst..

[24]  Stephen E. Robertson,et al.  Relevance weighting for query independent evidence , 2005, SIGIR '05.

[25]  Sebastiano Vigna,et al.  PageRank as a function of the damping factor , 2005, WWW '05.

[26]  Seok-Ho Yoon,et al.  TL-Rank: A Blend of Text and Link Information for Measuring Similarity in Scientific Literature Databases , 2012, IEICE Trans. Inf. Syst..

[27]  Azadeh Shakery,et al.  A probabilistic relevance propagation model for hypertext retrieval , 2006, CIKM '06.

[28]  Iadh Ounis,et al.  The Static Absorbing Model for the Web , 2005, J. Web Eng..

[29]  Ronald N. Kostoff Expanded information retrieval using full-text searching , 2010, J. Inf. Sci..

[30]  Iadh Ounis,et al.  Usefulness of hyperlink structure for query-biased topic distillation , 2004, SIGIR '04.

[31]  Brian D. Davison,et al.  Topical link analysis for web search , 2006, SIGIR.

[32]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[33]  Dong-Jin Kim,et al.  On exploiting content and citations together to compute similarity of scientific papers , 2013, CIKM.

[34]  Alberto O. Mendelzon,et al.  What is this page known for? Computing Web page reputations , 2000, Comput. Networks.

[35]  Lihi Zelnik-Manor,et al.  Viewpoint Selection for Human Actions , 2012, International Journal of Computer Vision.

[36]  David Hawking,et al.  Overview of the TREC 2004 Web Track , 2004, TREC.

[37]  Mounia Lalmas,et al.  Best Entry Pages for the Topic Distillation Task , 2013 .

[38]  Ye Guo MixPR-An Approach of Combining Content and Links of Web Page , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[39]  Edward A. Fox,et al.  Digital libraries , 1995, CACM.

[40]  Nasser Yazdani,et al.  DistanceRank: An intelligent ranking algorithm for web pages , 2008, Inf. Process. Manag..

[41]  Soumitra Dutta,et al.  A Web surfer model incorporating topic continuity , 2005, IEEE Transactions on Knowledge and Data Engineering.

[42]  David Hawking,et al.  Overview of the TREC 2003 Web Track , 2003, TREC.

[43]  Ricardo A. Baeza-Yates,et al.  Generalizing PageRank: damping functions for link-based ranking algorithms , 2006, SIGIR.

[44]  Oren Kurland,et al.  PageRank without hyperlinks: structural re-ranking using links induced by language models , 2005, SIGIR '05.