Data compression in database systems

This paper addresses the question of how information-theoretically-derived compact representations can be applied in practice to improve storage and processing efficiency in DBMS. Compact data representation has the potential for savings in storage, access and processing costs throughout the systems architecture and may alter the balance of usage between disk and solid state storage. To realise the potential performance benefits, however, novel systems engineering must be adopted to ensure that compression/decompression overheads are limited. This paper describes a basic approach to storage and processing of relations in a highly compressed form. A vertical columnwise representation is adopted in which columns can dynamically vary incrementally in both length and width. To achieve good performance query processing is carried out directly on the compressed relational representation (using a compressed representation of the query), thus avoiding decompression overheads. Measurements of performance of the Hi-base prototype implementation are compared with those obtained from conventional DBMS.

[1]  W. Paul Cockshott,et al.  Memory resident databases: reliability, compression and performance. , 1993 .

[2]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[3]  Hector Garcia-Molina,et al.  Main Memory Database Systems: An Overview , 1992, IEEE Trans. Knowl. Data Eng..

[4]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[5]  Robert C. Goldstein,et al.  The MacAIMS data management system , 1970, SIGFIDET '70.

[6]  E. F. Codd,et al.  A Relational Model for Large Shared Data Banks , 1970 .

[7]  Margaret H. Dunham,et al.  MARS: The Design of a Main Memory Database Machine , 1987, IWDM.

[8]  Patrick Valduriez,et al.  Efficient Main Memory Data Management Using the DBGraph Storage Model , 1990, VLDB.

[9]  Peter Boncz,et al.  Monet: An Impressionist Sketch of an Advanced Database System , 1994 .

[10]  David Abramson,et al.  Addressing Mechanisms for Large Virtual Memories , 1992, Comput. J..

[11]  David J. DeWitt,et al.  Benchmarking Database Systems A Systematic Approach , 1983, VLDB.

[12]  David J. DeWitt,et al.  A Single User Evaluation of the Gamma Database Machine , 1987, IWDM.

[13]  Peter M. G. Apers,et al.  Parallelism in a Main-Memory DBMS: The Performance of PRISMA/DB , 1992, VLDB.

[14]  William D. Roome,et al.  The Silicon Database Machine: Rationale, Design, and Results , 1987, IWDM.

[15]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[16]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[17]  Luis-Felipe Cabrera,et al.  An Evaluation of Starburst's Memory Resident Storage Component , 1992, IEEE Trans. Knowl. Data Eng..

[18]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .