Improving semantic compression specification in large relational database

The large-scale relational databases normally have a large size and a high degree of sparsity. This has made database compression very important to improve the performance and save storage space. Using standard compression techniques (syntactic) such as Gzip or Zip does not take advantage of the relational properties, as these techniques do not look at the nature of the data. Since semantic compression accounts for and exploits both the meanings and dynamic ranges of error for individual attributes (lossy compression); and existing data dependencies and correlations between attributes in the table (lossless compression), it is very effective for table-data compression. Inspired by semantic compression, this study proposes a novel independent lossless compression system through utilising data-mining model to find the frequent pattern with maximum gain (representative row) in order to draw attribute semantics, besides a modified version of an augmented vector quantisation coder to increase total throughput of the database compression. This algorithm enables more granular and suitable for every kind of massive data tables after synthetically considering compression ratio, space, and speed. The experimentation with several very large real-life datasets indicates the superiority of the system with respect to previously known lossless semantic techniques.

[1]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[2]  Anthony K. H. Tung,et al.  ItCompress: an iterative semantic compression algorithm , 2004, Proceedings. 20th International Conference on Data Engineering.

[3]  H. V. Jagadish,et al.  Semantic Compression and Pattern Extraction with Fascicles , 1999, VLDB.

[4]  Kenneth Ward Church,et al.  Engineering the compression of massive tables: an experimental approach , 2000, SODA '00.

[5]  T. Ravichandran,et al.  EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP , 2013 .

[6]  Vinti Nanda,et al.  A Compression Algorithm for Optimization of Storage Consumption of Non Oracle Database , 2012 .

[7]  Hugh E. Williams,et al.  A compression scheme for large databases , 2000, Proceedings 11th Australasian Database Conference. ADC 2000 (Cat. No.PR00528).

[8]  John Wilson,et al.  Database Compression Using an Offline Dictionary Method , 2002, ADVIS.

[9]  Adrian Walker,et al.  Semantic encoding of relational databases in wireless networks , 2005, SPIE Defense + Commercial Sensing.

[10]  Rajeev Rastogi,et al.  SPARTAN: a model-based semantic compression system for massive data tables , 2001, SIGMOD '01.

[11]  Aderemi A. Atayero,et al.  Compression of High-dimensional Data Spaces Using Non-differential Augmented Vector Quantization , 2007 .

[12]  Muthukumar Murugesan,et al.  REAL TIME DATABASE COMPRESSION OPTIMIZATION USING ITERATIVE LENGTH COMPRESSION ALGORITHM , 2013 .

[13]  Chen Gang,et al.  Automatic relational database compression scheme design based on swarm evolution , 2006 .

[14]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[15]  Shojiro Nishio,et al.  Database Compression with Data Mining Methods , 1998, FODO.

[16]  Jiawei Han,et al.  Lossless Semantic Compression for Relational Databases Title of Thesis: Lossless Semantic Compression for Relational Databases , 2001 .

[17]  U. Jayaraman,et al.  An Indexing technique for biometric database , 2008, 2008 International Conference on Wavelet Analysis and Pattern Recognition.

[18]  Chinya V. Ravishankar,et al.  Relational database compression using augmented vector quantization , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[19]  S. Aghav Database compression techniques for performance optimization , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[20]  Jiao Yabing,et al.  Research of an Improved Apriori Algorithm in Data Mining Association Rules , 2013 .

[21]  Chin-Feng Lee,et al.  A data mining approach to database compression , 2006, Inf. Syst. Frontiers.