Caching query-biased snippets for efficient retrieval

Web Search Engines' result pages contain references to the top-k documents relevant for the query submitted by a user. Each document is represented by a title, a snippet and a URL. Snippets, i.e. short sentences showing the portions of the document being relevant to the query, help users to select the most interesting results. The snippet generation process is very expensive, since it may require to access a number of documents for each issued query. We assert that caching, a popular technique used to enhance performance at various levels of any computing systems, can be very effective in this context. We design and experiment several cache organizations, and we introduce the concept of supersnippet, that is the set of sentences in a document that are more likely to answer future queries. We show that supersnippets can be built by exploiting query logs, and that in our experiments a supersnippet cache answers up to 62% of the requests, remarkably outperforming other caching approaches.

[1]  Hugh E. Williams,et al.  Fast generation of result snippets in web search , 2007, SIGIR.

[2]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[3]  Aristides Gionis,et al.  Design trade-offs for search engine caching , 2008, TWEB.

[4]  Torsten Suel,et al.  Improved techniques for result caching in web search engines , 2009, WWW '09.

[5]  Fabrizio Silvestri,et al.  Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data , 2006, TOIS.

[6]  Yinglian Xie,et al.  Locality in search engine queries and its implications for caching , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[7]  Fabrizio Silvestri,et al.  Mining Query Logs: Turning Search Usage Data into Knowledge , 2010, Found. Trends Inf. Retr..

[8]  Evangelos P. Markatos,et al.  On caching search engine query results , 2001, Comput. Commun..

[9]  장훈,et al.  [서평]「Computer Organization and Design, The Hardware/Software Interface」 , 1997 .

[10]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[11]  Dragomir R. Radev,et al.  Introduction to the Special Issue on Summarization , 2002, CL.

[12]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[13]  Shlomo Moran,et al.  Predictive caching and prefetching of query results in search engines , 2003, WWW '03.

[14]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[15]  Tapas Kanungo,et al.  Machine Learned Sentence Selection Strategies for Query-Biased Summarization , 2008 .

[16]  David A. Patterson,et al.  Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) , 2008 .

[17]  Ricardo A. Baeza-Yates,et al.  Challenges on Distributed Web Retrieval , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  Justin Zobel,et al.  Document Compaction for Efficient Query Biased Snippet Generation , 2009, ECIR.

[19]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[20]  Jie Lu,et al.  Pruning long documents for distributed information retrieval , 2002, CIKM '02.