A tunable compression framework for bitmap indices

Bitmap indices are widely used for large read-only repositories in data warehouses and scientific databases. Their binary representation allows for the use of bitwise operations and specialized run-length compression techniques. Due to a trade-off between compression and query efficiency, bitmap compression schemes are aligned using a fixed encoding length size (typically the word length) to avoid explicit decompression during query time. In general, smaller encoding lengths provide better compression, but require more decoding during query execution. However, when the difference in size is considerable, it is possible for smaller encodings to also provide better execution time. We posit that a tailored encoding length for each bit vector will provide better performance than a one-size-fits-all approach. We present a framework that optimizes compression and query efficiency by allowing bitmaps to be compressed using variable encoding lengths while still maintaining alignment to avoid explicit decompression. Efficient algorithms are introduced to process queries over bitmaps compressed using different encoding lengths. An input parameter controls the aggressiveness of the compression providing the user with the ability to tune the tradeoff between space and query time. Our empirical study shows this approach achieves significant improvements in terms of both query time and compression ratio for synthetic and real data sets. Compared to 32-bit WAH, VAL-WAH produces up to 1.8× smaller bitmaps and achieves query times that are 30% faster.

[1]  Owen Kaser,et al.  Histogram-aware sorting for enhanced word-aligned compression in bitmap indexes , 2008, DOLAP '08.

[2]  G. Antoshenkov,et al.  Byte-aligned bitmap compression , 1995, Proceedings DCC '95 Data Compression Conference.

[3]  James P. Ahrens,et al.  Taming massive distributed datasets: data sampling using bitmap indices , 2013, HPDC.

[4]  Arie Shoshani,et al.  Compressing bitmap indexes for faster search operations , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[5]  Alessandro Colantonio,et al.  Concise: Compressed 'n' Composable Integer Set , 2010, Inf. Process. Lett..

[6]  Oege de Moor,et al.  A memory efficient reachability data structure through bit vector compression , 2011, SIGMOD '11.

[7]  Doron Rotem,et al.  Bit Transposed Files , 1985, VLDB.

[8]  Prabhat,et al.  FastBit: interactively searching massive data , 2009 .

[9]  Tao Tao,et al.  Compressing bitmap indices by data reorganization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[10]  Owen Kaser,et al.  Sorting improves word-aligned bitmap indexes , 2010, Data Knowl. Eng..

[11]  John R. Kender,et al.  Optimizing Frequency Queries for Data Mining Applications , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[12]  A. Pinar,et al.  Improving Performance of Sparse Matrix-Vector Multiplication , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[13]  Arie Shoshani,et al.  Accelerating gene context analysis using bitmaps , 2013, SSDBM.

[14]  Israel Spiegler,et al.  Storage and retrieval considerations of binary data bases , 1985, Inf. Process. Manag..

[15]  Kesheng Wu,et al.  Notes on design and implementation of compressed bit vectors , 2001 .

[16]  Torben Bach Pedersen,et al.  Position list word aligned hybrid: optimizing space and performance for compressed bitmaps , 2010, EDBT '10.

[17]  Kesheng Wu,et al.  Bitmap Indices for Data Warehouses , 2006 .

[18]  Owen Kaser,et al.  Reordering rows for better compression: Beyond the lexicographic order , 2012, TODS.

[19]  Hakan Ferhatosmanoglu,et al.  Analysis of Basic Data Reordering Techniques , 2008, SSDBM.

[20]  David Chiu,et al.  Variable Length Compression for Bitmap Indices , 2011, DEXA.

[21]  Michail Vlachos,et al.  Net-Fli: On-the-fly Compression, Archiving and Indexing of Streaming Network Traffic , 2010, Proc. VLDB Endow..

[22]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[23]  Arie Shoshani,et al.  Optimizing bitmap indices with efficient compression , 2006, TODS.