Data Compression Support in Databases

Computers running database management applications often manage large amounts of data. Typically, the price of the I/O subsystem is a considerable portion of the computing hardware. Fierce price competition demands every possible savings. Lossless data compression methods, when appropriately integrated with the dbms, yield signiflcant savings. Roughly speaking, a slight increase in cpu cycles is more than offset by savings in I/O subsystem. Various design issues arise in the use of data compression in the dbms from the choice of algorithm, statistics collection, hardware versus software based compression, location of the compression function in the overall computer system architecture, unit of compression, update in place, and the application of log’ to compressed data. These are methodic & y examined and trade-offs discussed in the context of choices made for IBM’s DB2 dbms product.

[1]  Mostafa A. Bassiouni,et al.  Data Compression in Scientific and Statistical Databases , 1985, IEEE Transactions on Software Engineering.

[2]  Doron Rotem,et al.  Simple Random Sampling from Relational Databases , 1986, VLDB.

[3]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[4]  Yehoshua Perl,et al.  The cascading of the LZW compression algorithm with arithmetic coding , 1991, [1991] Proceedings. Data Compression Conference.

[5]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[6]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[7]  Nagarajan Ranganathan,et al.  A suggestion for performance improvement in a relational database machine , 1991 .

[8]  Ian H. Witten,et al.  Modeling for text compression , 1989, CSUR.

[9]  Mark A. Roth,et al.  Database compression , 1993, SGMD.

[10]  Glen G. Langdon,et al.  A simple general binary source code , 1982, IEEE Trans. Inf. Theory.

[11]  Gordon V. Cormack,et al.  Data compression on a database system , 1985, CACM.

[12]  P.A. Alsberg,et al.  Space and time savings through large data base compression and dynamic restructuring , 1975, Proceedings of the IEEE.

[13]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[14]  E. F. Moore,et al.  Variable-length binary encodings , 1959 .

[15]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[16]  Clifford A. Lynch,et al.  Application of Data Compression to a Large Bibliographic Data Base , 1981, VLDB.

[17]  Arie Shoshani,et al.  Efficient Access of Compressed Data , 1980, VLDB.

[18]  Goetz Graefe,et al.  Data compression and database performance , 1991, [Proceedings] 1991 Symposium on Applied Computing.

[19]  Glen G. Langdon,et al.  Sort order preserving data compression for extended alphabets , 1993, [Proceedings] DCC `93: Data Compression Conference.

[20]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[21]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[22]  Richard Clark Pasco,et al.  Source coding algorithms for fast data compression , 1976 .

[23]  Motomichi Toyama,et al.  Fixed length semiorder preserving code for field level data file compression , 1984, 1984 IEEE First International Conference on Data Engineering.