Supporting efficient query processing on compressed XML files

XML has been widely accepted as the de facto format for data representation and exchange. However, it is also known for the excessive information redundancy in its representation. While various compression schemes have been proposed and some of them can support query processing over compressed files, it is usually inevitable to perform partial (or full) data decompression which is expensive and in some cases may dominate the query processing time.In this paper, we propose a new XML compression scheme based on the Sequitur compression algorithm. By organizing the compression result as a set of context free grammar rules, the scheme supports efficient processing of XPath queries without decompression. The experimental results show that this scheme achieves comparable compression ratio as gzip while its query processing time is among the best of existing algorithms.

[1]  Ioana Manolescu,et al.  Xquec: Pushing Queries to Compressed XML Data , 2003, VLDB.

[2]  Chin-Wan Chung,et al.  XPRESS: a queriable compression for XML data , 2003, SIGMOD '03.

[3]  Peter Buneman,et al.  Edinburgh Research Explorer Path Queries on Compressed XML , 2022 .

[4]  Dan Suciu,et al.  Optimizing regular path expressions using graph schemas , 1998, Proceedings 14th International Conference on Data Engineering.

[5]  Dan Suciu,et al.  XMill: an efficient compressor for XML data , 2000, SIGMOD '00.

[6]  Jayant R. Haritsa,et al.  XGrind: a query-friendly XML compressor , 2002, Proceedings 18th International Conference on Data Engineering.

[7]  John N. Wilson,et al.  Improving XML Processing Using Adapted Data Structures , 2002, Web, Web-Services, and Database Systems.

[8]  Ian H. Witten,et al.  Linear-time, incremental hierarchy inference for compression , 1997, Proceedings DCC '97. Data Compression Conference.

[9]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[10]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[11]  Wilfred Ng,et al.  XQzip: Querying Compressed XML Using Structural Indexing , 2004, EDBT.

[12]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.