论文信息 - Locality in search engine queries and its implications for caching

Locality in search engine queries and its implications for caching

Caching is a popular technique for reducing both server load and user response time in distributed systems. We consider the question of whether caching might be effective for search engines as well. We study two real search engine traces by examining query locality and its implications for caching. Our trace analysis produced three results. One result shows that queries have significant locality, with query frequency following a Zipf distribution. Very popular queries are shared among different users and can be cached at servers or proxies, while 16% to 22% of the queries are from the same users and should be cached at the user side. Multiple-word queries are shared less and should be cached mainly at the user side. Another result shows that if caching is to be done at the user side, short-term caching for hours is enough to cover query temporal locality, while server/proxy caching should use longer periods, such as days. The third result showed that most users have small lexicons when submitting queries. Frequent users who submit many search requests tend to reuse a small subset of words to form queries. Thus, with proxy or user side caching, prefetching based on the user lexicon looks promising.

Yinglian Xie | David R. O'Hallaron | Yinglian Xie | D. O'Hallaron

[1] Amanda Spink,et al. Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[2] Craig Silverstein,et al. Analysis of a Very Large Altavista Query Log" SRC Technical note #1998-14 , 1998 .

[3] Joel H. Saltz,et al. Using Distributed Query Result Caching to Evaluate Queries for Parallel Data Mining Algorithms , 1998 .

[4] Evangelos P. Markatos. On Caching Search Engine Results , 2000 .

[5] Anja Feldmann,et al. Performance of Web Proxy Caching in Heterogenous Environments , 1999, INFOCOM 1999.

[6] SpinkAmanda,et al. Real life information retrieval: a study of user queries on the Web , 1998 .

[7] Alex Rousskov. On Performance of Caching Proxies , 1998, SIGMETRICS 1998.

[8] Prashant J. Shenoy,et al. Implications of proxy caching for provisioning networks and servers , 2000, SIGMETRICS '00.

[9] Carlos Maltzahn,et al. Performance issues of enterprise level web proxies , 1997, SIGMETRICS '97.

[10] Evangelos P. Markatos,et al. On caching search engine query results , 2001, Comput. Commun..

[11] Darrell D. E. Long,et al. Exploring the Bounds of Web Latency Reduction from Caching and Prefetching , 1997, USENIX Symposium on Internet Technologies and Systems.

[12] Wei Lin,et al. Web prefetching between low-bandwidth clients and proxies: potential and performance , 1999, SIGMETRICS '99.

[13] K. Selçuk Candan,et al. Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.