Optimizing Queries on Compressed Bitmaps

OptimizingQueriesOnCompressedBitmapsSihem Amer-YahiaAT&T Labs{Researchsihem@research.att.comTheo doreJohnsonjohnsont@research.att.comAbstractBitmap indices are used by DBMS's to accelerate decision supp ort queries.A signi cant advantage ofbitmap indices is that complex logical selection op erations can b e p erformed very quickly, by p erformingbit-wiseAND,OR,andNOTop erators.Althoughbitmapindicescanb espaceinecientforhighcardinalityattributes,the space use of compressed bitmapscompares well to other indexingmetho ds.Oracle and Sybase IQ are two commercial pro ducts that make extensive use of compressed bitmap indices.Our recent research showed that there are several fast algorithmsfor evaluatingBo oleanop eratorson compressedbitmaps.Dep endingon the natureof the op erandbitmaps(theirformat, densityandclusterdness) and the op eration to b e p erformed (AND, NOT, ...), these algorithms can have executiontimes that are orders of magnitude di erent.Cho osing an algorithm for p erforming a Bo olean op erationhas global e ects in the Bo olean query expression, requiring global optimization.We present a linear timedynamicprogrammingsearch strategy based on a cost mo delto optimizequeryexpressionevaluationplans.We alsopresentrewritingheuristicsthat rewritethe queryexpressionto anequivalenonetoencourage b etter algorithmsassignments.Our p erformance results show that the optimizerrequiresanegligibl e amount of time to execute, and that optimized complex queries can execute up to three timesfaster than unoptimized queries on real data.1Intro ductionAbitmap indexis a bit string in which each bit is mapp ed to a record ID (RID) of a relation.A bit in thebitmap index is set (to 1) if the corresp onding RID has prop ertyP(i.e., the RID represents a customer thatlives in New York), and is reset (to 0) otherwise.In typical usage, the predicatePis true for a record if it hasthe valueafor attributeA.One such predicate is asso ciated to one bitmap index for each unique value ofthe attributeA.The predicates can b e more complex, for example bitslice indices [OQ97] and precomputedcomplex selection predicates [HEP99].Oneadvantageofbitmapindicesisthatcomplexselectionpredicatescanb ecomputedveryquickly,by p erforming bit-wiseAND, OR, andNOTop erations on the bitmap indices.Furthermore, the indexableselection predicates can involve many attributes.Let's consider some examples, using a customer databasewith schemaCustomer(Name, Livesin, Worksin, Car, Numberofchildren, Hascable, Hasel lular)Supp osethatweanttoselectallcustomerswholivinNewEngland.Thentheselectioncon-ditionisLivesin=\ME"ORin=\VT"in=\NH"in=\MA"in=\CT"ORLivesin=\RI"in=\NY". Since a bitmap index is createdfor each value of the attributeLivesin, the query translates into mapping the attribute to all its p ossible values.1

[1]  Toshihide Ibaraki,et al.  On the optimal nesting order for computing N-relational joins , 1984, TODS.

[2]  Ming-Chuan Wu,et al.  Query optimization for selections using bitmaps , 1999, SIGMOD '99.

[3]  Alistair Moffat,et al.  Parameterised compression for sparse bitmaps , 1992, SIGIR '92.

[4]  Rick Greer,et al.  Daytona and the fourth-generation language Cymbal , 1999, SIGMOD '99.

[5]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[6]  G. Antoshenkov,et al.  Byte-aligned bitmap compression , 1995, Proceedings DCC '95 Data Compression Conference.

[7]  Yannis E. Ioannidis,et al.  Query optimization , 1996, CSUR.

[8]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[9]  Martin Schaller Reclustering of high energy physics data , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[10]  Goetz Graefe,et al.  Multi-table joins through bitmapped join indices , 1995, SGMD.

[11]  Michael V. Mannino,et al.  Statistical profile estimation in database systems , 1988, CSUR.

[12]  Arie Shoshani,et al.  Multidimensional indexing and query coordination for tertiary storage management , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[13]  E. M. Hartwell Boston , 1906 .

[14]  Eugene Wong,et al.  Query optimization by simulated annealing , 1987, SIGMOD '87.

[15]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[16]  Theodore Johnson,et al.  Performance Measurements of Compressed Bitmap Indices , 1999, VLDB.

[17]  Matthias Jarke,et al.  Query Optimization in Database Systems , 1984, CSUR.

[18]  Yannis E. Ioannidis,et al.  Randomized algorithms for optimizing large join queries , 1990, SIGMOD '90.

[19]  Yannis E. Ioannidis,et al.  An efficient bitmap encoding scheme for selection queries , 1999, SIGMOD '99.

[20]  Arun N. Swami,et al.  Optimization of large join queries , 1988, SIGMOD '88.

[21]  Alejandro P. Buchmann,et al.  Encoded bitmap indexing for data warehouses , 1998, Proceedings 14th International Conference on Data Engineering.