An efficient compression scheme for bitmap indices

When using an out-of-core indexing method to answer a query, it isgenerally assumed that the I/O cost dominates the overall query response time. Because of this, most research on indexing methods concentrate on reduceing the sizes of indices. For bitmap indices, compression has been used for this purpose. However, in most cases, operations on these compressed bitmaps, mostly bitwise logical operations such as AND, OR, and NOT, spend more time in CPU than in I/O. To speedup these operations, a number of specialized bitmap compression schemes have been developed; the best known of which is the byte-aligned bitmap code (BBC). They are usually faster in performing logical operations than the general purpose compression schemes, but, the time spent in CPU still dominates the total query response time. To reduce the query response time, we designed a CPU-friendly scheme named the word-aligned hybrid (WAH) code. In this paper, we prove that the sizes of WAH compressed bitmap indices are about two words per row for large range of attributes. This size is smaller than typical sizes of commonly used indices, such as a B-tree. Therefore, WAH compressed indices are not only appropriate for low cardinality attributes but also for high cardinality attributes.In the worst case, the time to operate on compressed bitmaps isproportional to the total size of the bitmaps involved. The totalsize of the bitmaps required to answer a query on one attribute is proportional to the number of hits. These indicate that WAH compressed bitmap indices are optimal. To verify their effectiveness, we generated bitmap indices for four different datasets and measured the response time of many range queries. Tests confirm that sizes of compressed bitmap indices are indeed smaller than B-tree indices, and query processing with WAH compressed indices is much faster than with BBC compressed indices, projection indices and B-tree indices. In addition, we also verified that the average query response time is proportional tothe index size. This indicates that the compressed bitmap indices are efficient for very large datasets.

[1]  Arie Shoshani,et al.  Multidimensional indexing and query coordination for tertiary storage management , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[2]  Dik Lun Lee,et al.  Efficient Signature File Methods for Text Retrieval , 1995, IEEE Trans. Knowl. Data Eng..

[3]  Elizabeth O'Neil,et al.  Database--Principles, Programming, and Performance , 1994 .

[4]  Kazutaka Furuse,et al.  Implementation and Performance Evaluation of Compressed Bit-Sliced Signature Files , 1995, CISMOD.

[5]  Theodore Johnson,et al.  Performance Measurements of Compressed Bitmap Indices , 1999, VLDB.

[6]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[7]  Patrick E. O'Neil,et al.  Model 204 Architecture and Performance , 1987, HPTS.

[8]  Volker Markl,et al.  Processing operations with restrictions in RDBMS without external sorting: the Tetris algorithm , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[9]  Mohamed Ziauddin,et al.  Query processing and optimization in Oracle Rdb , 1996, The VLDB Journal.

[10]  Doron Rotem,et al.  Bit Transposed Files , 1985, VLDB.

[11]  Arie Shoshani,et al.  Access Coordination of Tertiary Storage for High Energy Physics Applications , 2000, IEEE Symposium on Mass Storage Systems.

[12]  Hiroyuki Kitagawa,et al.  Evaluation of signature files as set access facilities in OODBs , 1993, SIGMOD '93.

[13]  Goetz Graefe,et al.  Multi-table joins through bitmapped join indices , 1995, SGMD.

[14]  Philip S. Yu,et al.  Range-based bitmap indexing for high cardinality attributes with skew , 1998, Proceedings. The Twenty-Second Annual International Computer Software and Applications Conference (Compsac '98) (Cat. No.98CB 36241).

[15]  Erich Schikuta,et al.  Improving the Performance of High-Energy Physics Analysis through Bitmap Indices , 2000, DEXA.

[16]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[17]  Yannis E. Ioannidis,et al.  An efficient bitmap encoding scheme for selection queries , 1999, SIGMOD '99.

[18]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[19]  Arie Shoshani,et al.  On the performance of bitmap indices for high cardinality attributes , 2004, VLDB.

[20]  Kesheng Wu,et al.  Notes on design and implementation of compressed bit vectors , 2001 .

[21]  Alistair Moffat,et al.  Parameterised compression for sparse bitmaps , 1992, SIGIR '92.

[22]  Arie Shoshani,et al.  Strategies for processing ad hoc queries on large data warehouses , 2002, DOLAP '02.

[23]  Alejandro P. Buchmann,et al.  Encoded bitmap indexing for data warehouses , 1998, Proceedings 14th International Conference on Data Engineering.

[24]  Arie Shoshani,et al.  Using bitmap index for interactive exploration of large datasets , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[25]  Hans-Joachim Lenz,et al.  Tree Based Indexes vs. Bitmap Indexes - a Performance Study , 1999, DMDW.

[26]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[27]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[28]  Nick Koudas Space efficient bitmap indexing , 2000, CIKM '00.

[29]  Sihem Amer-Yahia,et al.  Optimizing Queries on Compressed Bitmaps , 2000, VLDB.

[30]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[31]  G. Antoshenkov,et al.  Byte-aligned bitmap compression , 1995, Proceedings DCC '95 Data Compression Conference.