YAQCX: A Word-based Query-aware Compressor for XML Data

XML has become a de facto standard for data exchanging over the Internet. However, efficiently storing and querying XML data is still an open problem. In this paper we present YAQCX, Yet Another Query-aware Compressor for XML. YAQCX adopts word-based modeling combined with byte-coding to provide a very efficient approach to compressing/decompressing and querying XML data. It also implements a subset of XPath with a powerful pattern matching extension that allows regular expressions, range queries, and partial matching. Additionally, when processing queries, it accesses the actual compressed data as few as possible, for example to solve predicates on contents or to show results. Based on our experiments, we show that YAQCX compression ratios are comparable to XMill’s and very close to those of other query-aware compressors, such as XQzip and XGrind. We also show that YAQCX compresses and decompresses faster than XMill, and outperforms XGrind regarding query processing.