Multi-resolution bitmap indexes for scientific data

The unique characteristics of scientific data and queries cause traditional indexing techniques to perform poorly on scientific workloads, occupy excessive space, or both. Refinements of bitmap indexes have been proposed previously as a solution to this problem. In this article, we describe the difficulties we encountered in deploying bitmap indexes with scientific data and queries from two real-world domains. In particular, previously proposed methods of binning, encoding, and compressing bitmap vectors either were quite slow for processing the large-range query conditions our scientists used, or required excessive storage space. Nor could the indexes easily be built or used on parallel platforms. In this article, we show how to solve these problems through the use of multi-resolution, parallelizable bitmap indexes, which support a fine-grained trade-off between storage requirements and query performance. Our experiments with large data sets from two scientific domains show that multi-resolution, parallelizable bitmap indexes occupy an acceptable amount of storage while improving range query performance by roughly a factor of 10, compared to a single-resolution bitmap index of reasonable size.

[1]  Kesheng Wu,et al.  Notes on design and implementation of compressed bit vectors , 2001 .

[2]  Sihem Amer-Yahia,et al.  Optimizing Queries on Compressed Bitmaps , 2000, VLDB.

[3]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[4]  Marianne Winslett,et al.  An Efficient, Nonintrusive, Log-Based I/O Mechanism for Scientific Simulations on Clusters , 2005, 2005 IEEE International Conference on Cluster Computing.

[5]  Kesheng Wu,et al.  Optimizing candidate check costs for bitmap indices , 2005, CIKM '05.

[6]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[7]  Arie Shoshani,et al.  Compressing bitmap indexes for faster search operations , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[8]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[9]  Kesheng Wu,et al.  Optimizing I/O Costs of Multi-dimensional Queries Using Bitmap Indices , 2005, DEXA.

[10]  Yannis E. Ioannidis,et al.  An efficient bitmap encoding scheme for selection queries , 1999, SIGMOD '99.

[11]  Arie Shoshani,et al.  On the performance of bitmap indices for high cardinality attributes , 2004, VLDB.

[12]  Yannis Manolopoulos,et al.  Hierarchical Bitmap Index: An Efficient and Scalable Indexing Technique for Set-Valued Attributes , 2003, ADBIS.

[13]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[14]  Jianzhong Li,et al.  Bit transposition for very large scientific and statistical databases , 1986, Algorithmica.

[15]  Theodore Johnson,et al.  Performance Measurements of Compressed Bitmap Indices , 1999, VLDB.

[16]  Marianne Winslett,et al.  Bitmap indexes for large scientific data sets: a case study , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[17]  G. Antoshenkov,et al.  Byte-aligned bitmap compression , 1995, Proceedings DCC '95 Data Compression Conference.

[18]  Rajeev Thakur,et al.  Users guide for ROMIO: A high-performance, portable MPI-IO implementation , 1997 .

[19]  Alejandro P. Buchmann,et al.  Encoded bitmap indexing for data warehouses , 1998, Proceedings 14th International Conference on Data Engineering.

[20]  Jia-Ling Koh,et al.  Improved Sequential Pattern Mining Using an Extended Bitmap Representation , 2005, DEXA.

[21]  Nick Koudas Space efficient bitmap indexing , 2000, CIKM '00.

[22]  Doron Rotem,et al.  Bit Transposed Files , 1985, VLDB.

[23]  Guang-Ho Cha,et al.  Bitmap indexing method for complex similarity queries with relevance feedback , 2003, MMDB '03.

[24]  Arie Shoshani,et al.  Evaluation Strategies for Bitmap Indices with Binning , 2004, DEXA.

[25]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[26]  Arie Shoshani,et al.  Optimizing bitmap indices with efficient compression , 2006, TODS.

[27]  A. Guttmma,et al.  R-trees: a dynamic index structure for spatial searching , 1984 .

[28]  Ming-Chuan Wu,et al.  Query optimization for selections using bitmaps , 1999, SIGMOD '99.

[29]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[30]  Philip S. Yu,et al.  Range-based bitmap indexing for high cardinality attributes with skew , 1998, Proceedings. The Twenty-Second Annual International Computer Software and Applications Conference (Compsac '98) (Cat. No.98CB 36241).

[31]  Jongho Nang,et al.  An efficient bitmap indexing method for similarity search in high dimensional multimedia databases , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[32]  Erich Schikuta,et al.  Improving the Performance of High-Energy Physics Analysis through Bitmap Indices , 2000, DEXA.

[33]  Kurt Stockinger,et al.  Design and implementation of bitmap indices for scientific data , 2001, Proceedings 2001 International Database Engineering and Applications Symposium.

[34]  Hans-Joachim Lenz,et al.  Tree Based Indexes Versus Bitmap Indexes: A Performance Study , 2001, Int. J. Cooperative Inf. Syst..

[35]  Patrick E. O'Neil,et al.  Model 204 Architecture and Performance , 1987, HPTS.

[36]  Kesheng Wu,et al.  Minimizing I/O Costs of Multi-Dimensional Queries with Bitmap Indices , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).