Query-Aware Complex Object Buffer Management in XML Information Retrieval

In this paper, we analyse the data access characteristics of a typical XML information retrieval system and propose a new query aware buffer replacement algorithm based on prediction of Minimum Reuse Distance (MRD for short). The algorithm predicts an object’s next reference distance according to the retrieval system’s running status and replaces the objects that have maximum reuse distances. The factors considered in the replacement algorithm include the access frequency, creation cost, and size of objects, as well as the queries being executed. By taking into account the queries currently running or queuing in the system, MRD algorithm can predict more accurately the reuse distances of index data objects.

[1]  Gerhard Weikum,et al.  Integrated document caching and prefetching in storage hierarchies based on Markov-chain predictions , 1998, The VLDB Journal.

[2]  Torsten Suel,et al.  Three-level caching for efficient query processing in large Web search engines , 2005, WWW.

[3]  Wagner Meira,et al.  Rank-preserving two-level caching for scalable search engines , 2001, SIGIR '01.

[4]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[5]  László Böszörményi,et al.  A survey of Web cache replacement strategies , 2003, CSUR.

[6]  Ludmila Cherkasova,et al.  Improving WWW Proxies Performance with Greedy-Dual- Size-Frequency Caching Policy , 1998 .

[7]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[8]  Torsten Suel,et al.  Performance of compressed inverted list caching in search engines , 2008, WWW.

[9]  Divesh Srivastava,et al.  Interaction of query evaluation and buffer management for information retrieval , 1998, SIGMOD '98.

[10]  Sang Lyul Min,et al.  Using Full Reference History for Efficient Document Replacement in Web Caches , 1999, USENIX Symposium on Internet Technologies and Systems.

[11]  Torsten Grust,et al.  Accelerating XPath evaluation in any RDBMS , 2004, TODS.

[12]  Stefanos Kaxiras,et al.  Cache replacement based on reuse-distance prediction , 2007, 2007 25th International Conference on Computer Design.

[13]  Y. Charlie Hu,et al.  Program-Counter-Based Pattern Classification in Buffer Caching , 2004, OSDI.

[14]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[15]  Qiang Yang,et al.  Mining web logs for prediction models in WWW caching and prefetching , 2001, KDD '01.

[16]  Philip S. Yu,et al.  Characterization of database access pattern for analytic prediction of buffer hit probability , 2005, The VLDB Journal.

[17]  Evangelos P. Markatos,et al.  On caching search engine query results , 2001, Comput. Commun..

[18]  Gianfranco Ciardo,et al.  Role of Aging, Frequency, and Size in Web Cache Replacement Policies , 2001, HPCN Europe.

[19]  Aristides Gionis,et al.  The impact of caching on search engines , 2007, SIGIR.

[20]  Yong Zhang,et al.  Incremental Mining of Frequent Query Patterns from XML Queries for Caching , 2006, Sixth International Conference on Data Mining (ICDM'06).

[21]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.