论文信息 - Compact Snippet Caching for Flash-based Search Engines

Compact Snippet Caching for Flash-based Search Engines

In response to a user query, search engines return the top-k relevant results, each of which contains a small piece of text, called a snippet, extracted from the corresponding document. Obtaining a snippet is time consuming as it requires both document retrieval (disk access) and string matching (CPU computation), so caching of snippets is used to reduce latency. With the trend of using flash-based solid state drives (SSDs) instead of hard disk drives for search engine storage, the bottleneck of snippet generation shifts from I/O to computation. We propose a simple, but effective method for exploiting this trend, which we call fragment caching: instead of caching the whole snippet, we only cache snippet metadata which describe how to retrieve the snippet from the document. While this approach increases I/O time, the cost is insignificant on SSDs. The major benefit of fragment caching is the ability to cache the same snippets (without loss of quality) while only using a fraction of the memory the traditional method requires. In our experiments, we find around 10 times less memory is required to achieve comparable snippet generation times for dynamic memory, and we consistently achieve a vastly greater hit ratio for static caching.

[1] Ruixuan Li,et al. Fast Snippet Generation Based on CPU-GPU Hybrid System , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[2] Justin Zobel,et al. Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections , 2011, Proc. VLDB Endow..

[3] Luiz André Barroso,et al. Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[4] Aristides Gionis,et al. The impact of caching on search engines , 2007, SIGIR.

[5] Fabrizio Silvestri,et al. Caching query-biased snippets for efficient retrieval , 2011, EDBT/ICDT '11.

[6] Hugh E. Williams,et al. Fast generation of result snippets in web search , 2007, SIGIR.

[7] Justin Zobel,et al. Document Compaction for Efficient Query Biased Snippet Generation , 2009, ECIR.

[8] Xiaodong Zhang,et al. Understanding intrinsic characteristics and system implications of flash memory based solid state drives , 2009, SIGMETRICS '09.

[9] Gang Wang,et al. Latency-aware strategy for static list caching in flash-based web search engines , 2013, CIKM.

[10] Jeffrey Dean,et al. Challenges in building large-scale information retrieval systems: invited talk , 2009, WSDM '09.

[11] Gang Wang,et al. The impact of solid state drive on search engine cache management , 2013, SIGIR.