A five-level static cache architecture for web search engines

Caching is a crucial performance component of large-scale web search engines, as it greatly helps reducing average query response times and query processing workloads on backend search clusters. In this paper, we describe a multi-level static cache architecture that stores five different item types: query results, precomputed scores, posting lists, precomputed intersections of posting lists, and documents. Moreover, we propose a greedy heuristic to prioritize items for caching, based on gains computed by using items' past access frequencies, estimated computational costs, and storage overheads. This heuristic takes into account the inter-dependency between individual items when making its caching decisions, i.e., after a particular item is cached, gains of all items that are affected by this decision are updated. Our simulations under realistic assumptions reveal that the proposed heuristic performs better than dividing the entire cache space among particular item types at fixed proportions.

[1]  Donna Harman,et al.  Multi-task multi-modality SVM for early COVID-19 Diagnosis using chest CT data , 2021, Information Processing & Management.

[2]  Ricardo A. Baeza-Yates,et al.  A Three Level Search Engine Index Based in Query Log Distribution , 2003, SPIRE.

[3]  Jia Wang,et al.  A survey of web caching schemes for the Internet , 1999, CCRV.

[4]  Torsten Suel,et al.  Performance of compressed inverted list caching in search engines , 2008, WWW.

[5]  Veronica Gil Costa,et al.  New caching techniques for web search engines , 2010, HPDC '10.

[6]  Wagner Meira,et al.  Rank-preserving two-level caching for scalable search engines , 2001, SIGIR '01.

[7]  Abdur Chowdhury,et al.  Operational requirements for scalable search systems , 2003, CIKM '03.

[8]  Johannes Gehrke,et al.  Database management systems (3. ed.) , 2003 .

[9]  Jeffrey Dean,et al.  Challenges in building large-scale information retrieval systems: invited talk , 2009, WSDM '09.

[10]  Wolfgang Effelsberg,et al.  Principles of database buffer management , 1984, TODS.

[11]  Yinglian Xie,et al.  Locality in search engine queries and its implications for caching , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[12]  Evangelos P. Markatos,et al.  On caching search engine query results , 2001, Comput. Commun..

[13]  Steven Garcia,et al.  Search Engine Optimisation Using Past Queries , 2007 .

[14]  Craig MacDonald,et al.  Terrier Information Retrieval Platform , 2005, ECIR.

[15]  Ricardo Baeza-Yates,et al.  ResIn: a combination of results caching and index pruning for high-performance web search engines , 2008, SIGIR '08.

[16]  Özgür Ulusoy,et al.  A Cost-Aware Strategy for Query Result Caching in Web Search Engines , 2009, ECIR.

[17]  Fabrizio Silvestri,et al.  Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data , 2006, TOIS.

[18]  Aristides Gionis,et al.  The impact of caching on search engines , 2007, SIGIR.

[19]  Divesh Srivastava,et al.  Interaction of query evaluation and buffer management for information retrieval , 1998, SIGMOD '98.

[20]  Rafael Alonso,et al.  Data Caching in Information Retrieval Systems. , 1987, SIGIR 1987.

[21]  László Böszörményi,et al.  A survey of Web cache replacement strategies , 2003, CSUR.

[22]  Özgür Ulusoy,et al.  Static query result caching revisited , 2008, WWW.

[23]  Roi Blanco,et al.  Probabilistic static pruning of inverted files , 2010, TOIS.

[24]  Raffaele Perego,et al.  Tuning the capacity of search engines: Load-driven routing and incremental caching to reduce and balance the load , 2010, TOIS.

[25]  Özgür Ulusoy,et al.  Cost-Aware Strategies for Query Result Caching in Web Search Engines , 2011, TWEB.

[26]  Rafael Alonso,et al.  Data cashing in IR systems , 1987, SIGIR '87.

[27]  Shlomo Moran,et al.  Predictive caching and prefetching of query results in search engines , 2003, WWW '03.

[28]  Torsten Suel,et al.  Improved techniques for result caching in web search engines , 2009, WWW '09.

[29]  Berkant Barla Cambazoglu,et al.  A refreshing perspective of search engine caching , 2010, WWW '10.

[30]  Torsten Suel,et al.  Three-level caching for efficient query processing in large Web search engines , 2005, WWW.

[31]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[32]  Hector Garcia-Molina,et al.  Caching and database scaling in distributed shared-nothing information retrieval systems , 1993, SIGMOD '93.

[33]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[34]  Rafael Alonso,et al.  Data caching issues in an information retrieval system , 1990, TODS.

[35]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.