A Hybrid Cache and Prefetch Mechanism for Scientific Literature Search Engines

CiteSeer, a scientific literature search engine that focuses on documents in the computer science and information science domains, suffers from scalability issue on the number of requests and the size of indexed documents, which increased dramatically over the years. CiteSeerχ is an effort to re-architect the search engine. In this paper, we present our initial design of a framework for caching query results, indices, and documents. This design is based on analysis of logged workload in CiteSeer. Our experiments based on mock client requests that simulate actual user behaviors confirm that our approach works well in enhancing system performances.

[1]  Minyou Wu,et al.  Web Prefetching : Costs , Benefits and Performance , 2002 .

[2]  C. Lee Giles,et al.  Indexing and retrieval of scientific literature , 1999, CIKM '99.

[3]  Ron Kohavi,et al.  WEBKDD 2001 — Mining Web Log Data Across All Customers Touch Points , 2002, Lecture Notes in Computer Science.

[4]  Christoph Hölscher How Internet Experts Search For Information On The Web , 1998, WebNet.

[5]  Yannis Manolopoulos,et al.  . EFFECTIVE PREDICTION OF WEB-USER ACCESSES: A DATA MINING APPROACH , 2001 .

[6]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.

[7]  Chita R. Das,et al.  A novel caching scheme for improving Internet-based mobile ad hoc networks performance , 2006, Ad Hoc Networks.

[8]  Evangelos P. Markatos,et al.  On caching search engine query results , 2001, Comput. Commun..

[9]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[10]  Torsten Suel,et al.  Three-Level Caching for Efficient Query Processing in Large Web Search Engines , 2005, WWW '05.

[11]  Wagner Meira,et al.  Rank-preserving two-level caching for scalable search engines , 2001, SIGIR '01.

[12]  Divesh Srivastava,et al.  Interaction of query evaluation and buffer management for information retrieval , 1998, SIGMOD '98.

[13]  C. Lee Giles,et al.  Who gets acknowledged: Measuring scientific contributions through automatic acknowledgment indexing , 2004, Proc. Natl. Acad. Sci. USA.

[14]  Yannis Manolopoulos,et al.  Exploiting Web Log Mining for Web Cache Enhancement , 2001, WEBKDD.

[15]  Shlomo Moran,et al.  Optimizing result prefetching in web search engines with segmented indices , 2002, TOIT.

[16]  Jianliang Xu,et al.  Caching Complementary Space for Location-Based Services , 2006, EDBT.

[17]  Shlomo Moran,et al.  Optimizing Result Prefetching in Web Search Engines with Segmented Indices , 2002, VLDB.

[18]  John Wilkes,et al.  My Cache or Yours? Making Storage More Exclusive , 2002, USENIX Annual Technical Conference, General Track.

[19]  Torsten Grust,et al.  Advances in database technology - EDBT 2006 : 10th International Conference on Extending Database Technology, Munich, Germany, March 2006; proceedings , 2006 .

[20]  Stanley B. Zdonik,et al.  Profile-driven cache management , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[21]  Shlomo Moran,et al.  Predictive caching and prefetching of query results in search engines , 2003, WWW '03.

[22]  J. Spencer Love,et al.  Caching strategies to improve disk system performance , 1994, Computer.

[23]  Yinglian Xie,et al.  Locality in search engine queries and its implications for caching , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.