Position list word aligned hybrid: optimizing space and performance for compressed bitmaps

Compressed bitmap indexes are increasingly used for efficiently querying very large and complex databases. The Word Aligned Hybrid (WAH) bitmap compression scheme is commonly recognized as the most efficient compression scheme in terms of CPU efficiency. However, WAH compressed bitmaps use a lot of storage space. This paper presents the Position List Word Aligned Hybrid (PLWAH) compression scheme that improves significantly over WAH compression by better utilizing the available bits and new CPU instructions. For typical bit distributions, PLWAH compressed bitmaps are often half the size of WAH bitmaps and, at the same time, offer an even better CPU efficiency. The results are verified by theoretical estimates and extensive experiments on large amounts of both synthetic and real-world data.

[1]  Mohamed Ziauddin,et al.  Query processing and optimization in Oracle Rdb , 1996, The VLDB Journal.

[2]  Patrick E. O'Neil,et al.  Bit-sliced index arithmetic , 2001, SIGMOD '01.

[3]  Alejandro P. Buchmann,et al.  Encoded bitmap indexing for data warehouses , 1998, Proceedings 14th International Conference on Data Engineering.

[4]  K. H. Randall,et al.  Using de Bruijn Sequences to Index a 1 in a Computer Word , 1998 .

[5]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[6]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[7]  Arie Shoshani,et al.  Optimizing bitmap indices with efficient compression , 2006, TODS.

[8]  Kesheng Wu,et al.  Optimizing candidate check costs for bitmap indices , 2005, CIKM '05.

[9]  Hakan Ferhatosmanoglu,et al.  Approximate encoding for direct access and query processing over compressed bitmaps , 2006, VLDB.

[10]  Jason Evans April A Scalable Concurrent malloc(3) Implementation for FreeBSD , 2006 .

[11]  Robert Wrembel,et al.  RLH: bitmap compression technique based on run-length and huffman encoding , 2007, DOLAP '07.

[12]  Philip S. Yu,et al.  Range-based bitmap indexing for high cardinality attributes with skew , 1998, Proceedings. The Twenty-Second Annual International Computer Software and Applications Conference (Compsac '98) (Cat. No.98CB 36241).

[13]  Robert Wrembel,et al.  Data Warehouses And Olap: Concepts, Architectures And Solutions , 2006 .

[14]  Patrick E. O'Neil,et al.  Model 204 Architecture and Performance , 1987, HPTS.

[15]  Sihem Amer-Yahia,et al.  Optimizing Queries on Compressed Bitmaps , 2000, VLDB.

[16]  E.,et al.  Using de Bruijn Sequences toIndex a 1 in a Computer , 1998 .

[17]  Nick Koudas Space efficient bitmap indexing , 2000, CIKM '00.

[18]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).