A data mining approach to database compression

Data mining can dig out valuable information from databases to assist a business in approaching knowledge discovery and improving business intelligence. Database stores large structured data. The amount of data increases due to the advanced database technology and extensive use of information systems. Despite the price drop of storage devices, it is still important to develop efficient techniques for database compression. This paper develops a database compression method by eliminating redundant data, which often exist in transaction database. The proposed approach uses a data mining structure to extract association rules from a database. Redundant data will then be replaced by means of compression rules. A heuristic method is designed to resolve the conflicts of the compression rules. To prove its efficiency and effectiveness, the proposed approach is compared with two other database compression methods.

[1]  M. Effros PPM performance with BWT complexity: a fast and effective data compression algorithm , 2000, Proceedings of the IEEE.

[2]  V CormackGordon Data compression on a database system , 1985 .

[3]  Ian H. Witten,et al.  Modeling for text compression , 1989, CSUR.

[4]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[5]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[6]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[7]  Henry Ker-Chang Chang,et al.  A New Locally Adaptive Data Compression Scheme using Multilist Structure , 1993, Comput. J..

[8]  En-Hui Yang,et al.  Universal lossless data compression with side information by using a conditional MPM grammar transform , 2001, IEEE Trans. Inf. Theory.

[9]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[10]  Chinya V. Ravishankar,et al.  Block-Oriented Compression Techniques for Large Statistical Databases , 1997, IEEE Trans. Knowl. Data Eng..

[11]  Sergio De Agostino Parallelism and dictionary based data compression , 2001, Inf. Sci..

[12]  Rajeev Rastogi,et al.  SPARTAN: a model-based semantic compression system for massive data tables , 2001, SIGMOD '01.

[13]  Mostafa A. Bassiouni,et al.  Data Compression in Scientific and Statistical Databases , 1985, IEEE Transactions on Software Engineering.

[14]  Robert G. Gallager,et al.  Variations on a theme by Huffman , 1978, IEEE Trans. Inf. Theory.

[15]  Bernhard Balkenhol,et al.  Universal Data Compression Based on the Burrows-Wheeler Transformation: Theory and Practice , 2000, IEEE Trans. Computers.

[16]  Chin-Chen Chang,et al.  A locally adaptive data compression strategy for Chinese-English characters , 1997 .

[17]  Gordon V. Cormack,et al.  Data compression on a database system , 1985, CACM.

[18]  Shojiro Nishio,et al.  Database Compression with Data Mining Methods , 1998, FODO.

[19]  Alistair Moffat,et al.  Text Compression for Dynamic Document Databases , 1997, IEEE Trans. Knowl. Data Eng..

[20]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[21]  Craig Stanfill,et al.  Compression of indexes with full positional information in very large text databases , 1993, SIGIR.

[22]  Robert E. Tarjan,et al.  A Locally Adaptive Data , 1986 .

[23]  Donald E. Knuth,et al.  Dynamic Huffman Coding , 1985, J. Algorithms.

[24]  A. Restivo,et al.  Data compression using antidictionaries , 2000, Proceedings of the IEEE.

[25]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[26]  Hugh E. Williams,et al.  A compression scheme for large databases , 2000, Proceedings 11th Australasian Database Conference. ADC 2000 (Cat. No.PR00528).

[27]  J.D. Gibson,et al.  Adaptive prediction in speech differential encoding systems , 1980, Proceedings of the IEEE.

[28]  Jeffrey Scott Vitter,et al.  Design and analysis of dynamic Huffman codes , 1987, JACM.

[29]  R. Nigel Horspool,et al.  Data Compression Using Dynamic Markov Modelling , 1987, Comput. J..

[30]  W. Paul Cockshott,et al.  Data compression in database systems , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).