VParC: a compression scheme for numeric data in column-oriented databases

Compression is one of the most important techniques in data management, which is usually used to improve the query efficiency in database. However, there are some restrictions on existing compression algorithms that have been applied to numeric data in column!oriented databases. First , a compression algorithm is suitable only for columns with certain data distributions not for all kinds of data columns; second, a data column with irregular distribution is hard to be compressed; third, the data column compressed by using heavyweight methods cannot be operated before decompression which leads to inefficient query. Based on the fact that it is more possible for a column to have sub!regularity than have global!regularity, we developed a compression scheme called Vertically Partitioning Compression (VParC). This method is suitable for columns with different data distributions, even for irregular columns in some cases. The more important thing is that data compressed by VParC can be operated directly without decompression in advance. Details of the compression and query evaluation ap! proaches are presented in this paper and the results of our experiments demonstrate the promising features of VParC.

[1]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[2]  David J. DeWitt,et al.  Read-optimized databases, in depth , 2008, Proc. VLDB Endow..

[3]  Alberto Leon-Garcia,et al.  Efficient run-length encodings , 1982, IEEE Trans. Inf. Theory.

[4]  Johannes Gehrke,et al.  Query optimization in compressed database systems , 2001, SIGMOD '01.

[5]  Alexander Zeier,et al.  Speeding Up Queries in Column Stores - A Case for Compression , 2010, DaWak.

[6]  Sanjay Misra,et al.  A lossless text compression technique using syllable based morphology , 2011, Int. Arab J. Inf. Technol..

[7]  Daniel J. Abadi,et al.  Column-stores vs. row-stores: how different are they really? , 2008, SIGMOD Conference.

[8]  Devangkumar Shah,et al.  VLSI-oriented lossy image compression approach using DA-based 2D-discrete wavelet , 2014, Int. Arab J. Inf. Technol..

[9]  Mark A. Roth,et al.  Database compression , 1993, SGMD.

[10]  Goetz Graefe,et al.  Query processing techniques for solid state drives , 2009, SIGMOD Conference.

[11]  Ming-Syan Chen,et al.  Exploring Application-Level Semantics for Data Compression , 2011, IEEE Transactions on Knowledge and Data Engineering.

[12]  Tong Zhang,et al.  Using Lossless Data Compression in Data Storage Systems: Not for Saving Space , 2011, IEEE Transactions on Computers.

[13]  Daniel J. Abadi,et al.  Performance tradeoffs in read-optimized databases , 2006, VLDB.

[14]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[15]  Carsten Binnig,et al.  Dictionary-based order-preserving string compression for main memory column stores , 2009, SIGMOD Conference.

[16]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[17]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[18]  Daniel J. Abadi,et al.  Query execution in column-oriented database systems , 2008 .

[19]  Goetz Graefe Efficient columnar storage in B-trees , 2007, SGMD.

[20]  Per-Åke Larson,et al.  SQL server column store indexes , 2011, SIGMOD '11.