XQueC: A query-conscious compressed XML database

XML compression has gained prominence recently because it counters the disadvantage of the verbose representation XML gives to data. In many applications, such as data exchange and data archiving, entirely compressing and decompressing a document is acceptable. In other applications, where queries must be run over compressed documents, compression may not be beneficial since the performance penalty in running the query processor over compressed data outweighs the data compression benefits. While balancing the interests of compression and query processing has received significant attention in the domain of relational databases, these results do not immediately translate to XML data. In this article, we address the problem of embedding compression into XML databases without degrading query performance. Since the setting is rather different from relational databases, the choice of compression granularity and compression algorithms must be revisited. Query execution in the compressed domain must also be rethought in the framework of XML query processing due to the richer structure of XML data. Indeed, a proper storage design for the compressed data plays a crucial role here. The XQueC system (XQuery Processor and Compressor) covers a wide set of XQuery queries in the compressed domain and relies on a workload-based cost model to perform the choices of the compression granules and of their corresponding compression algorithms. As a consequence, XQueC provides efficient query processing on compressed XML data. An extensive experimental assessment is presented, showing the effectiveness of the cost model, the compression ratios, and the query execution times.

[1]  Ioana Manolescu,et al.  Path Summaries and Path Partitioning in Modern XML Databases , 2006, WWW '06.

[2]  Jayant R. Haritsa,et al.  XGrind: a query-friendly XML compressor , 2002, Proceedings 18th International Conference on Data Engineering.

[3]  Rick Greer,et al.  Daytona and the fourth-generation language Cymbal , 1999, SIGMOD '99.

[4]  Anthony K. H. Tung,et al.  ItCompress: an iterative semantic compression algorithm , 2004, Proceedings. 20th International Conference on Data Engineering.

[5]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[6]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[7]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[8]  Gennady Antoshenkov,et al.  Dictionary-based order-preserving string compression , 1997, The VLDB Journal.

[9]  Mark Levene,et al.  XCQ: A queriable XML compression system , 2006, Knowledge and Information Systems.

[10]  Meikel Pöss,et al.  Data Compression in Oracle , 2003, VLDB.

[11]  Jonathan Goldstein,et al.  Compressing relations and indexes , 1998, Proceedings 14th International Conference on Data Engineering.

[12]  Peter Buneman,et al.  Edinburgh Research Explorer Path Queries on Compressed XML , 2022 .

[13]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[14]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[15]  Sven Helmer,et al.  Anatomy of a native XML base management system , 2002, The VLDB Journal.

[16]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[17]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[18]  Ioana Manolescu,et al.  Algebra-Based Identification of Tree Patterns in XQuery , 2006, FQAS.

[19]  Wilfred Ng,et al.  Comparative Analysis of XML Compression Technologies , 2006, World Wide Web.

[20]  Sebastian Maneth,et al.  Efficient Memory Representation of XML Documents , 2005, DBPL.

[21]  Cong Yu,et al.  TIMBER: A native XML database , 2002, The VLDB Journal.

[22]  David J. DeWitt,et al.  Mixed Mode XML Query Processing , 2003, VLDB.

[23]  Wilfred Ng,et al.  XQzip: Querying Compressed XML Using Structural Indexing , 2004, EDBT.

[24]  Alistair Moffat,et al.  Coding for compression in full-text retrieval systems , 1992, Data Compression Conference, 1992..

[25]  Laks V. S. Lakshmanan,et al.  On the evaluation of tree pattern queries , 2006, ICSOFT.

[26]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[27]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[28]  G. Antoshenkov,et al.  Order preserving string compression , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[29]  Dan Suciu,et al.  Containment and equivalence for an XPath fragment , 2002, PODS.

[30]  Ricardo A. Baeza-Yates,et al.  Fast and flexible word searching on compressed text , 2000, TOIS.

[31]  Gabriella Kazai INitiative for the Evaluation of XML Retrieval , 2009, Encyclopedia of Database Systems.

[32]  Juliana Freire,et al.  From XML schema to relations: a cost-based approach to XML storage , 2002, Proceedings 18th International Conference on Data Engineering.

[33]  Ioana Manolescu,et al.  Efficient Query Evaluation over Compressed XML Data , 2004, EDBT.

[34]  Cong Yu,et al.  TIMBER: a native system for querying XML , 2003, SIGMOD '03.

[35]  Chin-Wan Chung,et al.  A compressor for effective archiving, retrieval, and updating of XML documents , 2006, TOIT.

[36]  Chin-Wan Chung,et al.  XPRESS: a queriable compression for XML data , 2003, SIGMOD '03.

[37]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[38]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.

[39]  James Cheney An Empirical Evaluation of Simple DTD-Conscious Compression Techniques , 2005, WebDB.

[40]  David B. Lomet,et al.  Order Preserving Compression , 1996, ICDE 1996.

[41]  Dan Suciu,et al.  XMill: an efficient compressor for XML data , 2000, SIGMOD 2000.

[42]  T. C. Hu,et al.  Optimal Computer Search Trees and Variable-Length Alphabetical Codes , 1971 .

[43]  Sven Helmer,et al.  The implementation and performance of compressed databases , 2000, SGMD.

[44]  Johannes Gehrke,et al.  Query optimization in compressed database systems , 2001, SIGMOD '01.

[45]  James Cheney Compressing XML with multiplexed hierarchical PPM models , 2001, Proceedings DCC 2001. Data Compression Conference.

[46]  Sihem Amer-Yahia,et al.  Optimizing Queries on Compressed Bitmaps , 2000, VLDB.

[47]  Dan Suciu,et al.  XMill: an efficient compressor for XML data , 2000, SIGMOD '00.