Syntactical Compression of XML Data

One of the most palpable drawbacks of XML can be seen in its excessive storage requirements. In this paper, we address this problem by proposing a syntactical XML compression scheme which makes use of probabilistic modeling of XML structure. Our compression scheme works sequentially and makes on-line processing of the data possible. We describe the current state of development of the prototype compressor and present some preliminary performance evaluation results. The compressor is designed to be extensible, and intended to serve as a platform for further research in the field of syntactical XML data compression.

[1]  Jayant R. Haritsa,et al.  XGrind: a query-friendly XML compressor , 2002, Proceedings 18th International Conference on Data Engineering.

[2]  James Cheney Compressing XML with multiplexed hierarchical PPM models , 2001, Proceedings DCC 2001. Data Compression Conference.

[3]  Craig G. Nevill-Manning,et al.  Compression and Explanation Using Hierarchical Grammars , 1997, Comput. J..

[4]  Ian H. Witten,et al.  Arithmetic coding revisited , 1998, TOIS.

[5]  Neel Sundaresan,et al.  Millau: an encoding format for efficient representation and exchange of XML over the Web , 2000, Comput. Networks.

[6]  Daniel S. Hirschberg,et al.  Data compression , 1987, CSUR.

[7]  Dan Suciu,et al.  XMill: an efficient compressor for XML data , 2000, SIGMOD 2000.

[8]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[9]  Dan Suciu,et al.  XMill: an efficient compressor for XML data , 2000, SIGMOD '00.

[10]  En-Hui Yang,et al.  Grammar-based codes: A new class of universal lossless source codes , 2000, IEEE Trans. Inf. Theory.