XGrind: a query-friendly XML compressor

XML documents are extremely verbose since the "schema" is repeated for every "record" in the document. While a variety of compressors are available to address this problem, they are not designed to support direct querying of the compressed document, a useful feature from a database perspective. In this paper, we propose a new compression tool, called XGrind, that directly supports queries in the compressed domain. A special feature of XGrind is that the compressed document retains the structure of the original document, permitting reuse of the standard XML techniques for processing the compressed document. Performance evaluations over a variety of XML documents and user queries indicate that XGrind simultaneously delivers improved query processing times and reasonable compression ratios.

[1]  Alin Deutsch,et al.  A Query Language for XML , 1999, Comput. Networks.

[2]  Neel Sundaresan,et al.  Millau: an encoding format for efficient representation and exchange of XML over the Web , 2000, Comput. Networks.

[3]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[4]  Mostafa A. Bassiouni,et al.  Data Compression in Scientific and Statistical Databases , 1985, IEEE Transactions on Software Engineering.

[5]  Mark A. Roth,et al.  Database compression , 1993, SGMD.

[6]  Jayant R. Haritsa,et al.  Database Compression: A Performance Enhancement Tool , 1995, COMAD.

[7]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[8]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[9]  Arie Shoshani,et al.  A Compression Technique for Large Statistical Data-Bases , 1981, VLDB.

[10]  Jennifer Widom,et al.  Indexing Semistructured Data , 1998 .

[11]  W. Paul Cockshott,et al.  Data compression in database systems , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[12]  Ioana Manolescu,et al.  The XML benchmark project , 2001 .

[13]  John L. McCarthy,et al.  Metadata Management for Large Statistical Databases , 1982, VLDB.

[14]  Douglas W. Jones,et al.  Application of splay trees to data compression , 1988, CACM.

[15]  Mostafa A. Bassiouni,et al.  Utilization of Character Reference Locality for Efficient Storage of Data Base , 1983, SSDBM.

[16]  Ezio Lefons,et al.  An Analytic Approach to Statistical Databases , 1983, VLDB.

[17]  W. Paul Cockshott,et al.  Data compression in database systems , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).

[18]  Dennis G. Severance,et al.  A practitioner's guide to data base compression - Tutorial , 1983, Inf. Syst..

[19]  Abraham Silberschatz,et al.  Database System Concepts , 1980 .

[20]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[21]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[22]  Dan Suciu,et al.  XMill: an efficient compressor for XML data , 2000, SIGMOD '00.

[23]  Mary Fernandez XML Query Languages: Experiences and Exemplars , 2001 .

[24]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[25]  Balakrishna R. Iyer,et al.  Data Compression Support in Databases , 1994, VLDB.

[26]  Miron Livny,et al.  Distributed Concurrency Control Performance: A Study of Algorithms, Distribution, and Replication , 1988, VLDB.

[27]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[28]  Don S. Batory,et al.  Index Coding: A Compression Technique for Large Statistical Databases , 1983, SSDBM.

[29]  James Cheney Compressing XML with multiplexed hierarchical PPM models , 2001, Proceedings DCC 2001. Data Compression Conference.

[30]  Glen G. Langdon,et al.  Sort order preserving data compression for extended alphabets , 1993, [Proceedings] DCC `93: Data Compression Conference.

[31]  Goetz Graefe,et al.  Options in physical database design , 1993, SGMD.

[32]  Goetz Graefe,et al.  Data compression and database performance , 1991, [Proceedings] 1991 Symposium on Applied Computing.