Timestamp-based result cache invalidation for web search engines

The result cache is a vital component for efficiency of large-scale web search engines, and maintaining the freshness of cached query results is the current research challenge. As a remedy to this problem, our work proposes a new mechanism to identify queries whose cached results are stale. The basic idea behind our mechanism is to maintain and compare generation time of query results with update times of posting lists and documents to decide on staleness of query results. The proposed technique is evaluated using a Wikipedia document collection with real update information and a real-life query log. We show that our technique has good prediction accuracy, relative to a baseline based on the time-to-live mechanism. Moreover, it is easy to implement and incurs less processing overhead on the system relative to a recently proposed, more sophisticated invalidation mechanism.

[1]  Özgür Ulusoy,et al.  Timestamp-based cache invalidation for search engines , 2011, WWW.

[2]  Torsten Suel,et al.  Three-Level Caching for Efficient Query Processing in Large Web Search Engines , 2005, WWW '05.

[3]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[4]  Torsten Suel,et al.  Improved techniques for result caching in web search engines , 2009, WWW '09.

[5]  Alistair Moffat,et al.  Efficient online index construction for text databases , 2008, TODS.

[6]  Hans Friedrich Witschel,et al.  Admission Policies for Caches of Search Engine Results , 2007, SPIRE.

[7]  Ronald Fagin,et al.  Static index pruning for information retrieval systems , 2001, SIGIR '01.

[8]  Wann-Yun Shieh,et al.  A statistics-based approach to incrementally update inverted files , 2005, Inf. Process. Manag..

[9]  Charles L. A. Clarke,et al.  Hybrid index maintenance for growing text collections , 2006, SIGIR.

[10]  Hugh E. Williams,et al.  In-Place versus Re-Build versus Re-Merge: Index Maintenance Strategies for Text Retrieval Systems , 2004, ACSC.

[11]  Aristides Gionis,et al.  The impact of caching on search engines , 2007, SIGIR.

[12]  Shlomo Moran,et al.  Predictive caching and prefetching of query results in search engines , 2003, WWW '03.

[13]  Özgür Ulusoy,et al.  A Cost-Aware Strategy for Query Result Caching in Web Search Engines , 2009, ECIR.

[14]  Özgür Ulusoy,et al.  A five-level static cache architecture for web search engines , 2012, Inf. Process. Manag..

[15]  Raffaele Perego,et al.  Tuning the capacity of search engines: Load-driven routing and incremental caching to reduce and balance the load , 2010, TOIS.

[16]  Özgür Ulusoy,et al.  Cost-Aware Strategies for Query Result Caching in Web Search Engines , 2011, TWEB.

[17]  Wagner Meira,et al.  Rank-preserving two-level caching for scalable search engines , 2001, SIGIR '01.

[18]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[19]  Jeffrey Dean,et al.  Challenges in building large-scale information retrieval systems: invited talk , 2009, WSDM '09.

[20]  Evangelos P. Markatos,et al.  On caching search engine query results , 2001, Comput. Commun..

[21]  Marc Najork,et al.  A large‐scale study of the evolution of Web pages , 2003, WWW '03.

[22]  Roi Blanco,et al.  Caching search engine results over incremental indices , 2010, WWW '10.

[23]  Steven Garcia,et al.  Search Engine Optimisation Using Past Queries , 2007 .

[24]  Susan T. Dumais,et al.  The web changes everything: understanding the dynamics of web content , 2009, WSDM '09.

[25]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[26]  Hector Garcia-Molina,et al.  Incremental updates of inverted lists for text document retrieval , 1994, SIGMOD '94.

[27]  Ricardo Baeza-Yates,et al.  ResIn: a combination of results caching and index pruning for high-performance web search engines , 2008, SIGIR '08.

[28]  Ricardo A. Baeza-Yates,et al.  A Three Level Search Engine Index Based in Query Log Distribution , 2003, SPIRE.

[29]  Özgür Ulusoy,et al.  Second Chance: A Hybrid Approach for Dynamic Result Caching in Search Engines , 2011, ECIR.

[30]  Veronica Gil Costa,et al.  New caching techniques for web search engines , 2010, HPDC '10.

[31]  Christopher Olston,et al.  What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.

[32]  Berkant Barla Cambazoglu,et al.  A refreshing perspective of search engine caching , 2010, WWW '10.

[33]  Jan O. Pedersen,et al.  Optimization for dynamic inverted index maintenance , 1989, SIGIR '90.

[34]  Fabrizio Silvestri,et al.  Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data , 2006, TOIS.