Compression of High-dimensional Data Spaces Using Non-differential Augmented Vector Quantization

Most data-intensive applications are confronted with the problems of I/O bottleneck, poor query processing times and space requirements. Database compression alleviates this bottleneck, reduces disk space usage, improves disk access speed, speeds up query response time, reduces overall retrieval time and increases the effective I/O bandwidth. However, random access to individual tuples in a compressed database is very difficult to achieve with most of the available compression techniques. This paper reports a lossless compression technique called nondifferential augmented vector quantization. The technique is applicable to a collection of tuples and especially effective for tuples with numerous low to medium cardinality fields. In addition, the technique supports standard database operations, permits very fast random access and atomic decompression of tuples in large collections. The technique maps a database relation into a static bitmap index cached access structure. Consequently, we were able to achieve substantial savings in space by storing each database tuple as a bit value in the computer memory. Important distinguishing characteristics of our technique are that tuples can be compressed and decompressed individually rather than a full page or entire relation at a time. Furthermore, the information needed for tuple compression and decompression can reside in the memory. Possible application domains of this technique include decision support systems, statistical and life databases with low cardinality fields and possibly no text fields.

[1]  Henrique Madeira,et al.  The Dimension-Join: A New Index for Data Warehouses , 2001, SBBD.

[2]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[3]  Ralph P. Grimaldi,et al.  Discrete and combinatorial mathematics , 1985 .

[4]  Jayant R. Haritsa,et al.  Database Compression: A Performance Enhancement Tool , 1995, COMAD.

[5]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[6]  Ralph P. Grimaldi,et al.  Discrete and Combinatorial Mathematics: An Applied Introduction , 1998 .

[7]  Nick Roussopoulos,et al.  Materialized views and data warehouses , 1998, SGMD.

[8]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[9]  Chinya V. Ravishankar,et al.  The performance of difference coding for sets and relational tables , 2003, JACM.

[10]  Chinya V. Ravishankar,et al.  Block-Oriented Compression Techniques for Large Statistical Databases , 1997, IEEE Trans. Knowl. Data Eng..

[11]  Hugh E. Williams,et al.  A compression scheme for large databases , 2000, Proceedings 11th Australasian Database Conference. ADC 2000 (Cat. No.PR00528).

[12]  Chinya V. Ravishankar,et al.  Relational database compression using augmented vector quantization , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[13]  Johannes Gehrke,et al.  Query optimization in compressed database systems , 2001, SIGMOD '01.

[14]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[15]  John G. Hughes Database technology - a software engineering approach , 1988, Prentice Hall international series in computer science.