XMill: an efficient compressor for XML data

We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing compressors in order to apply them to heterogeneous XML data: it uses zlib, the library function for gzip, a collection of datatype specific compressors for simple data types, and, possibly, user defined compressors for application specific data types.

[1]  Dan Suciu,et al.  An extensible compressor for XML data , 2000, SGMD.

[2]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[3]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[4]  Stéphane Grumbach,et al.  A New Challenge for Compression Algorithms: Genetic Sequences , 1994, Inf. Process. Manag..

[5]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[6]  G. Cameron,et al.  The EMBL data library. , 1988, Nucleic acids research.

[7]  Dan Suciu,et al.  XMill: an efficient compressor for XML data , 2000, SIGMOD 2000.

[8]  Dan Suciu Xmill: an Eecient Compressor for Xml Data , 1999 .

[9]  Serge Abiteboul,et al.  Inferring structure in semistructured data , 1997, SGMD.

[10]  Balakrishna R. Iyer,et al.  Data Compression Support in Databases , 1994, VLDB.

[11]  Mark A. Roth,et al.  Database compression , 1993, SGMD.

[12]  Chinya V. Ravishankar,et al.  Block-Oriented Compression Techniques for Large Statistical Databases , 1997, IEEE Trans. Knowl. Data Eng..

[13]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[14]  G.G. Langdon,et al.  Data compression , 1988, IEEE Potentials.

[15]  Jonathan Goldstein,et al.  Compressing relations and indexes , 1998, Proceedings 14th International Conference on Data Engineering.

[16]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[17]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[18]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.