Cache-Conscious Automata for XML Filtering

Hardware cache behavior is an important factor in the performance of memory-resident, data-intensive systems such as XML filtering engines. A key data structure in several recent XML filters is the automaton, which is used to represent the long-running XML queries in the main memory. In this paper, we study the cache performance of automaton-based XML filtering through analytical modeling and system measurement. Furthermore, we propose a cache-conscious automaton organization technique, called the hot buffer, to improve the locality of automaton state transitions. Our results show that 1) our cache performance model for XML filtering automata is highly accurate and 2) the hot buffer improves the cache performance as well as the overall performance of automaton-based XML filtering

[1]  Jeffrey F. Naughton,et al.  Cache Conscious Algorithms for Relational Query Processing , 1994, VLDB.

[2]  Dan Suciu,et al.  Processing XML Streams with Deterministic Automata , 2003, ICDT.

[3]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[4]  Derick Wood,et al.  Theory of computation , 1986 .

[5]  Kenneth A. Ross,et al.  Buffering Accesses to Memory-Resident Index Structures , 2003, VLDB.

[6]  Kenneth A. Ross,et al.  Making B+-Trees Cache Conscious in Main Memory , 2000, SIGMOD Conference.

[7]  Hao Zhang,et al.  Path sharing and predicate evaluation for high-performance XML filtering , 2003, TODS.

[8]  Mithuna Thottethodi,et al.  Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.

[9]  Kenneth A. Ross,et al.  Cache Conscious Indexing for Decision-Support in Main Memory , 1999, VLDB.

[10]  George Anton Kiraz,et al.  Compressed Storage of Sparse Finite-State Transducers , 1999, WIA.

[11]  Jignesh M. Patel,et al.  Data Morphing: An Adaptive, Cache-Conscious Storage Technique , 2003, VLDB.

[12]  Michael J. Franklin,et al.  Efficient Filtering of XML Documents for Selective Dissemination of Information , 2000, VLDB.

[13]  Divesh Srivastava,et al.  Counting twig matches in a tree , 2001, Proceedings 17th International Conference on Data Engineering.

[14]  Alfonso F. Cardenas Analysis and performance of inverted data base structures , 1975, CACM.

[15]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[16]  Martin L. Kersten,et al.  Generic Database Cost Models for Hierarchical Memory Systems , 2002, VLDB.

[17]  Yanlei Diao,et al.  YFilter: efficient and scalable filtering of XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[18]  Bruce W. Watson Practical Optimizations for Automata , 1997, Workshop on Implementing Automata.

[19]  Bingsheng He,et al.  Cache-Conscious Automata for XML Filtering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[20]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[21]  Nils Klarlund,et al.  BDD Algortihms and Cache Misses , 1996 .

[22]  Anastasia Ailamaki,et al.  Improving Hash Join Performance through Prefetching , 2004, ICDE.

[23]  Rajeev Rastogi,et al.  Efficient filtering of XML documents with XPath expressions , 2002, The VLDB Journal.

[24]  Gary Valentin,et al.  Fractal prefetching B+-Trees: optimizing both cache and disk performance , 2002, SIGMOD '02.