Compressing bitmap indexes for faster search operations

We study the effects of compression on bitmap indexes. The main operations on the bitmaps during query processing are bitwise logical operations. Using the general purpose compression schemes the logical operations on the compressed bitmaps are much slower than on the uncompressed bitmaps. Specialized compression schemes, like the byte-aligned bitmap code (BBC), are usually faster in performing logical operations than the general purpose schemes, but in many cases they are still orders of magnitude slower than the uncompressed scheme. To make the compressed bitmap indexes operate more efficiently, we designed a CPU-friendly scheme which we refer to as the word-aligned hybrid code (WAH). Tests on both synthetic and real application data show that the new scheme significantly outperforms well-known compression schemes at a modest increase in storage space. Compared to BBC, WAH performs logical operations about 12 times faster and uses only 60% more space. Compared to the uncompressed scheme, in most test cases WAH is faster while still using less space. We further verified with additional tests that the improvement in logical operation speed translates to similar improvement in query processing speed.

[1]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[2]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[3]  Doron Rotem,et al.  Bit Transposed Files , 1985, VLDB.

[4]  Patrick E. O'Neil,et al.  Model 204 Architecture and Performance , 1987, HPTS.

[5]  Alistair Moffat,et al.  Parameterised compression for sparse bitmaps , 1992, SIGIR '92.

[6]  Hiroyuki Kitagawa,et al.  Evaluation of signature files as set access facilities in OODBs , 1993, SIGMOD '93.

[7]  Goetz Graefe,et al.  Multi-table joins through bitmapped join indices , 1995, SGMD.

[8]  Kazutaka Furuse,et al.  Implementation and Performance Evaluation of Compressed Bit-Sliced Signature Files , 1995, CISMOD.

[9]  G. Antoshenkov,et al.  Byte-aligned bitmap compression , 1995, Proceedings DCC '95 Data Compression Conference.

[10]  Dik Lun Lee,et al.  Efficient Signature File Methods for Text Retrieval , 1995, IEEE Trans. Knowl. Data Eng..

[11]  Mohamed Ziauddin,et al.  Query processing and optimization in Oracle Rdb , 1996, The VLDB Journal.

[12]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[13]  Sudha Ram,et al.  Proceedings of the 1997 ACM SIGMOD international conference on Management of data , 1997, ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

[14]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[15]  Rudolf Bayer UB-Trees and UB-Cache A new Processing Paradigm for Database Systems , 1997 .

[16]  Philip S. Yu,et al.  Range-based bitmap indexing for high cardinality attributes with skew , 1998, Proceedings. The Twenty-Second Annual International Computer Software and Applications Conference (Compsac '98) (Cat. No.98CB 36241).

[17]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[18]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[19]  Alejandro P. Buchmann,et al.  Encoded bitmap indexing for data warehouses , 1998, Proceedings 14th International Conference on Data Engineering.

[20]  Arie Shoshani,et al.  Multidimensional indexing and query coordination for tertiary storage management , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[21]  Theodore Johnson,et al.  Performance Measurements of Compressed Bitmap Indices , 1999, VLDB.

[22]  Volker Markl,et al.  Mistral - Processing Relational Queries using a Multidimensional Access Technique , 1999, Datenbank Rundbr..

[23]  Hans-Joachim Lenz,et al.  Tree Based Indexes vs. Bitmap Indexes - a Performance Study , 1999, DMDW.

[24]  Yannis E. Ioannidis,et al.  An efficient bitmap encoding scheme for selection queries , 1999, SIGMOD '99.

[25]  Sihem Amer-Yahia,et al.  Optimizing Queries on Compressed Bitmaps , 2000, VLDB.

[26]  Arie Shoshani,et al.  Access Coordination of Tertiary Storage for High Energy Physics Applications , 2000, IEEE Symposium on Mass Storage Systems.

[27]  Matthias Jarke,et al.  Query Processing and Optimization , 2000 .

[28]  Erich Schikuta,et al.  Improving the Performance of High-Energy Physics Analysis through Bitmap Indices , 2000, DEXA.

[29]  Nick Koudas Space efficient bitmap indexing , 2000, CIKM '00.

[30]  Volker Markl,et al.  Processing relational OLAP queries with UB-Trees and multidimensional hierarchical clustering , 2000, DMDW.

[31]  Kesheng Wu,et al.  Notes on design and implementation of compressed bit vectors , 2001 .