Difference Computation for Grammar-Compressed XML Data

Whenever web data processing requires the storage or the exchange of multiple versions of big XML data collections and the pure size of big XML data becomes a bottleneck in storage or fast data exchange over the web, XML compression and XML version control may become significant contributions to avoid such a bottleneck. Grammar-based compression is of increasing importance for big XML data collections in the web as it allows fast queries and updates on compressed data without full decompression. However, merging different versions of grammar-based compressed XML data collections is a challenge, because small differences in two given uncompressed XML files may lead to significant differences in the grammar-based compressed data formats of these files. Therefore, when multiple versions of an XML file have to be stored in compressed form, the different compressed formats may be difficult to combine, which weakens the benefit achieved by the compression. To overcome this weakening, we present a technique to compute the common part and the difference of two compressed XML documents without the need to fully decompress the documents. Our approach computes a compressed common prefix and parameters representing the difference of two compressed XML documents in polynomial time in the size of the grammar compressed documents, even if the common part of the documents is hidden in completely different sets of compressed grammar rules.

[1]  Jayant R. Haritsa,et al.  XGrind: a query-friendly XML compressor , 2002, Proceedings 18th International Conference on Data Engineering.

[2]  Stefan Böttcher,et al.  XML index compression by DTD subtraction , 2007, ICEIS.

[3]  Gonzalo Navarro,et al.  Lempel-Ziv compression of structured text , 2004, Data Compression Conference, 2004. Proceedings. DCC 2004.

[4]  Neel Sundaresan,et al.  Millau: an encoding format for efficient representation and exchange of XML over the Web , 2000, Comput. Networks.

[5]  Wojciech Plandowski,et al.  Testing Equivalence of Morphisms on Context-Free Languages , 1994, ESA.

[6]  Mark Levene,et al.  XCQ: A queriable XML compression system , 2006, Knowledge and Information Systems.

[7]  Ioana Manolescu,et al.  XQueC: A query-conscious compressed XML database , 2007, TOIT.

[8]  Priti Shankar,et al.  Compressing XML Documents Using Recursive Finite State Automata , 2005, CIAA.

[9]  Serge Abiteboul,et al.  Detecting changes in XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[10]  Sebastian Maneth,et al.  Efficient Memory Representation of XML Documents , 2005, DBPL.

[11]  Jussi Myllymaki,et al.  An evaluation of binary xml encoding optimizations for fast stream based xml processing , 2004, WWW '04.

[12]  Sebastian Maneth,et al.  Tree Structure Compression with RePair , 2011, 2011 Data Compression Conference.

[13]  Wilfred Ng,et al.  XQzip: Querying Compressed XML Using Structural Indexing , 2004, EDBT.

[14]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[15]  James Cheney Compressing XML with multiplexed hierarchical PPM models , 2001, Proceedings DCC 2001. Data Compression Conference.

[16]  Stefan Böttcher,et al.  CluX - Clustering XML Sub-trees , 2010, ICEIS.

[17]  M. Tamer Özsu,et al.  A succinct physical storage scheme for efficient evaluation of path queries in XML , 2004, Proceedings. 20th International Conference on Data Engineering.

[18]  Christian Werner,et al.  Compressing SOAP Messages by using Pushdown Automata , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[19]  David J. DeWitt,et al.  X-Diff: an effective change detection algorithm for XML documents , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[20]  Jon Louis Bentley,et al.  Data compression using long common strings , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[21]  Eugene W. Myers,et al.  AnO(ND) difference algorithm and its variations , 1986, Algorithmica.

[22]  Peter Buneman,et al.  Edinburgh Research Explorer Path Queries on Compressed XML , 2022 .

[23]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[24]  Chin-Wan Chung,et al.  XPRESS: a queriable compression for XML data , 2003, SIGMOD '03.

[25]  Hector Garcia-Molina,et al.  Meaningful change detection in structured data , 1997, SIGMOD '97.

[26]  Curtis E. Dyreson,et al.  Schema-Less, Semantics-Based Change Detection for XML Documents , 2004, WISE.