Leveraging temporal dynamics of document content in relevance ranking

Many web documents are dynamic, with content changing in varying amounts at varying frequencies. However, current document search algorithms have a static view of the document content, with only a single version of the document in the index at any point in time. In this paper, we present the first published analysis of using the temporal dynamics of document content to improve relevance ranking. We show that there is a strong relationship between the amount and frequency of content change and relevance. We develop a novel probabilistic document ranking algorithm that allows differential weighting of terms based on their temporal characteristics. By leveraging such content dynamics we show significant performance improvements for navigational queries.

[1]  Michael Herscovici,et al.  Efficient Indexing of Versioned Document Sequences , 2007, ECIR.

[2]  Peter G. Anick,et al.  Versioning a full-text information retrieval system , 1992, SIGIR '92.

[3]  Marc Najork,et al.  A large‐scale study of the evolution of Web pages , 2004, Softw. Pract. Exp..

[4]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[5]  James P. Callan,et al.  Combining document representations for known-item search , 2003, SIGIR.

[6]  Jian-Yun Nie,et al.  Search Engine Adaptation by Feedback Control Adjustment for Time-sensitive Query , 2009, NAACL.

[7]  Fernando Diaz,et al.  Using temporal profiles of queries for precision prediction , 2004, SIGIR '04.

[8]  Sandeep Pandey,et al.  Recrawl scheduling based on information longevity , 2008, WWW.

[9]  Fernando Diaz,et al.  Temporal profiles of queries , 2007, TOIS.

[10]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[11]  Susan T. Dumais,et al.  Resonance on the web: web dynamics and revisitation patterns , 2009, CHI.

[12]  Mira Dontcheva,et al.  Zoetrope: interacting with the ephemeral web , 2008, UIST '08.

[13]  Michael Gertz,et al.  Clustering of search results using temporal attributes , 2006, SIGIR.

[14]  Christopher Olston,et al.  What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.

[15]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[16]  W. Bruce Croft,et al.  Time-based language models , 2003, CIKM '03.

[17]  Susan T. Dumais,et al.  The web changes everything: understanding the dynamics of web content , 2009, WSDM '09.

[18]  Adam Jatowt,et al.  Visualizing historical content of web pages , 2008, WWW.

[19]  Junghoo Cho,et al.  Page quality: in search of an unbiased web ranking , 2005, SIGMOD '05.

[20]  Gerhard Weikum,et al.  A Time Machine for Text Search , 2022 .

[21]  Zhenyu Liu,et al.  Automatic identification of user goals in Web search , 2005, WWW '05.