An In-Memory XQuery/XPath Engine over a Compressed Structured Text Representation

We describe the architecture and main algorithmic design decisions for an XQuery/XPath processing engine over XML collections which will be represented using a self-indexing approach, that is, a compressed representation that will allow for basic searching and navigational operations in compressed form. The goal is a structure that occupies little space and thus permits manipulating large collections in main memory. 1 Generalities In principle we will aim at a static representation, because it will be significantly faster and easier to program (a good part already exists). Only for the text we will use a dynamic representation at construction time, so as to permit building the index in compressed form. Let u be the total length of the collection (measured in symbols), n be the total number of

[1]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees and multisets , 2002, SODA '02.

[2]  Sven Helmer,et al.  Algebraic Optimization of Nested XPath Expressions , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[3]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[4]  Giovanni Manzini,et al.  An analysis of the Burrows-Wheeler transform , 2001, SODA '99.

[5]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[6]  Norman May,et al.  Index vs. Navigation in XPath Evaluation , 2006, XSym.

[7]  Meng He,et al.  Indexing Compressed Text , 2003 .

[8]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[9]  Laks V. S. Lakshmanan,et al.  On the evaluation of tree pattern queries , 2006, ICSOFT.

[10]  Ioana Manolescu,et al.  XQueC: A query-conscious compressed XML database , 2007, TOIT.

[11]  Torsten Grust,et al.  Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps , 2003, VLDB.

[12]  Venkatesh Raman,et al.  Succinct representation of balanced parentheses, static trees and planar graphs , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[13]  S. Srinivasa Rao,et al.  Rank/select operations on large alphabets: a tool for text indexing , 2006, SODA '06.

[14]  Dan Suciu,et al.  Processing XML streams with deterministic automata and stream indexes , 2004, TODS.

[15]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .