Efficient Query Evaluation over Compressed XML Data

XML suffers from the major limitation of high redundancy. Even if compression can be beneficial for XML data, however, once compressed, the data can be seldom browsed and queried in an efficient way. To address this problem, we propose XQueC, an [XQue]ry processor and [C]ompressor, which covers a large set of XQuery queries in the compressed domain. We shred compressed XML into suitable data structures, aiming at both reducing memory usage at query time and querying data while compressed. XQueC is the first system to take advantage of a query workload to choose the compression algorithms, and to group the compressed data granules according to their common properties. By means of experiments, we show that good trade-offs between compression ratio and query capability can be achieved in several real cases, as those covered by an XML benchmark. On average, XQueC improves over previous XML query-aware compression systems, still being reasonably closer to general-purpose query-unaware XML compressors. Finally, QETs for a wide variety of queries show that XQueC can reach speed comparable to XQuery engines on uncompressed data.

[1]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[2]  T. C. Hu,et al.  Optimal Computer Search Trees and Variable-Length Alphabetical Codes , 1971 .

[3]  Meikel Pöss,et al.  Data Compression in Oracle , 2003, VLDB.

[4]  Praveen Seshadri,et al.  An algebraic compression framework for query results , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[5]  Cong Yu,et al.  TIMBER: a native system for querying XML , 2003, SIGMOD '03.

[6]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[7]  Gennady Antoshenkov,et al.  Dictionary-based order-preserving string compression , 1997, The VLDB Journal.

[8]  Peter Buneman,et al.  Edinburgh Research Explorer Path Queries on Compressed XML , 2022 .

[9]  Jayant R. Haritsa,et al.  XGrind: a query-friendly XML compressor , 2002, Proceedings 18th International Conference on Data Engineering.

[10]  Sven Helmer,et al.  The implementation and performance of compressed databases , 2000, SGMD.

[11]  Johannes Gehrke,et al.  Query optimization in compressed database systems , 2001, SIGMOD '01.

[12]  Dan Suciu,et al.  XMill: an efficient compressor for XML data , 2000, SIGMOD '00.

[13]  Amélie Marian,et al.  Projecting XML Documents , 2003, VLDB.

[14]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[15]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[16]  Alistair Moffat,et al.  Coding for compression in full-text retrieval systems , 1992, Data Compression Conference, 1992..

[17]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[18]  Ricardo A. Baeza-Yates,et al.  Fast and flexible word searching on compressed text , 2000, TOIS.

[19]  Juliana Freire,et al.  From XML schema to relations: a cost-based approach to XML storage , 2002, Proceedings 18th International Conference on Data Engineering.

[20]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[21]  Yannis E. Ioannidis,et al.  Query optimization , 1996, CSUR.

[22]  Jonathan Goldstein,et al.  Compressing relations and indexes , 1998, Proceedings 14th International Conference on Data Engineering.

[23]  Hamid Pirahesh,et al.  Efficiently publishing relational data as XML documents , 2001, The VLDB Journal.

[24]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[25]  Sihem Amer-Yahia Storage Techniques and Mapping Schemas for XML , 2003 .

[26]  Ioana Manolescu,et al.  Xquec: Pushing Queries to Compressed XML Data , 2003, VLDB.

[27]  G. Antoshenkov,et al.  Order preserving string compression , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[28]  Chin-Wan Chung,et al.  XPRESS: a queriable compression for XML data , 2003, SIGMOD '03.