A refreshing perspective of search engine caching

Commercial Web search engines have to process user queries over huge Web indexes under tight latency constraints. In practice, to achieve low latency, large result caches are employed and a portion of the query traffic is served using previously computed results. Moreover, search engines need to update their indexes frequently to incorporate changes to the Web. After every index update, however, the content of cache entries may become stale, thus decreasing the freshness of served results. In this work, we first argue that the real problem in today's caching for large-scale search engines is not eviction policies, but the ability to cope with changes to the index, i.e., cache freshness. We then introduce a novel algorithm that uses a time-to-live value to set cache entries to expire and selectively refreshes cached results by issuing refresh queries to back-end search clusters. The algorithm prioritizes the entries to refresh according to a heuristic that combines the frequency of access with the age of an entry in the cache. In addition, for setting the rate at which refresh queries are issued, we present a mechanism that takes into account idle cycles of back-end servers. Evaluation using a real workload shows that our algorithm can achieve hit rate improvements as well as reduction in average hit ages. An implementation of this algorithm is currently in production use at Yahoo!.

[1]  Divesh Srivastava,et al.  Interaction of query evaluation and buffer management for information retrieval , 1998, SIGMOD '98.

[2]  Darrell D. E. Long,et al.  Exploring the Bounds of Web Latency Reduction from Caching and Prefetching , 1997, USENIX Symposium on Internet Technologies and Systems.

[3]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[4]  Torsten Suel,et al.  Improved techniques for result caching in web search engines , 2009, WWW '09.

[5]  Vijay V. Raghavan,et al.  On the reuse of past optimal queries , 1995, SIGIR '95.

[6]  Rafael Alonso,et al.  Data caching issues in an information retrieval system , 1990, TODS.

[7]  S. V. Nagaraj Web Caching and Its Applications , 2004 .

[8]  S. V. Nagaraj Web Caching And Its Applications (Kluwer International Series in Engineering and Computer Science) , 2004 .

[9]  Igor Tatarinov An Efficient LFU-Like Policy for Web Caches , 1998 .

[10]  Yinglian Xie,et al.  Locality in search engine queries and its implications for caching , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[11]  Torsten Suel,et al.  Three-Level Caching for Efficient Query Processing in Large Web Search Engines , 2005, WWW '05.

[12]  Alfred V. Aho,et al.  Principles of Optimal Page Replacement , 1971, J. ACM.

[13]  Hector Garcia-Molina,et al.  Caching and database scaling in distributed shared-nothing information retrieval systems , 1993, SIGMOD '93.

[14]  Edith Cohen,et al.  Refreshment policies for Web content caches , 2002, Comput. Networks.

[15]  Aristides Gionis,et al.  The impact of caching on search engines , 2007, SIGIR.

[16]  Shlomo Moran,et al.  Predictive caching and prefetching of query results in search engines , 2003, WWW '03.

[17]  J. Spencer Love,et al.  Caching strategies to improve disk system performance , 1994, Computer.

[18]  Peter J. Denning,et al.  Virtual memory , 1970, CSUR.

[19]  Fabrizio Silvestri,et al.  Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data , 2006, TOIS.

[20]  J. T. Robinson,et al.  Data cache management using frequency-based replacement , 1990, SIGMETRICS '90.

[21]  Wolfgang Effelsberg,et al.  Principles of database buffer management , 1984, TODS.

[22]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[23]  Eric A. Brewer,et al.  Lessons from Giant-Scale Services , 2001, IEEE Internet Comput..

[24]  Evangelos P. Markatos,et al.  On caching search engine query results , 2001, Comput. Commun..

[25]  Sang Lyul Min,et al.  LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies , 2001, IEEE Trans. Computers.

[26]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[27]  Ricardo Baeza-Yates,et al.  ResIn: a combination of results caching and index pruning for high-performance web search engines , 2008, SIGIR '08.

[28]  László Böszörményi,et al.  A survey of Web cache replacement strategies , 2003, CSUR.

[29]  Özgür Ulusoy,et al.  Static query result caching revisited , 2008, WWW.

[30]  Nimrod Megiddo,et al.  Outperforming LRU with an adaptive replacement cache algorithm , 2004, Computer.

[31]  Philip S. Yu,et al.  Caching on the World Wide Web , 1999, IEEE Trans. Knowl. Data Eng..

[32]  Wagner Meira,et al.  Rank-preserving two-level caching for scalable search engines , 2001, SIGIR '01.