UpBit: Scalable In-Memory Updatable Bitmap Indexing

Bitmap indexes are widely used in both scientific and commercial databases. They bring fast read performance for specific types of queries, such as equality and selective range queries. A major drawback of bitmap indexes, however, is that supporting updates is particularly costly. Bitmap indexes are kept compressed to minimize storage footprint; as a result, updating a bitmap index requires the expensive step of decoding and then encoding a bitvector. Today, more and more applications need support for both reads and writes, blurring the boundaries between analytical processing and transaction processing. This requires new system designs and access methods that support general updates and, at the same time, offer competitive read performance. In this paper, we propose scalable in-memory Updatable Bitmap indexing (UpBit), which offers efficient updates, without hurting read performance. UpBit relies on two design points. First, in addition to the main bitvector for each domain value, UpBit maintains an update bitvector, to keep track of updated values. Effectively, every update can now be directed to a highly-compressible, easy-to-update bitvector. While update bitvectors double the amount of uncompressed data, they are sparse, and as a result their compressed size is small. Second, we introduce fence pointers in all update bitvectors which allow for efficient retrieval of a value at an arbitrary position. Using both synthetic and real-life data, we demonstrate that UpBit significantly outperforms state-of-the-art bitmap indexes for workloads that contain both reads and writes. In particular, compared to update-optimized bitmap index designs UpBit is 15-29x faster in terms of update time and 2.7x faster in terms of read performance. In addition, compared to read-optimized bitmap index designs UpBit achieves efficient and scalable updates (51-115x lower update latency), while allowing for comparable read performance, having up to 8% overhead.

[1]  Patrick E. O'Neil,et al.  Model 204 Architecture and Performance , 1987, HPTS.

[2]  Fan Yang,et al.  Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing , 2014, Proc. VLDB Endow..

[3]  Doron Rotem,et al.  Bit Transposed Files , 1985, VLDB.

[4]  Roger MacNicol,et al.  Sybase IQ Multiplex - Designed For Analytics , 2004, VLDB.

[5]  Kurt Stockinger,et al.  Bitmap Indices for Speeding Up High-Dimensional Data Analysis , 2002, DEXA.

[6]  Michail Vlachos,et al.  Real-time creation of bitmap indexes on streaming network data , 2011, The VLDB Journal.

[7]  E IoannidisYannis,et al.  Bitmap index design and evaluation , 1998 .

[8]  Hakan Ferhatosmanoglu,et al.  Update Conscious Bitmap Indices , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[9]  Kesheng Wu,et al.  Notes on design and implementation of compressed bit vectors , 2001 .

[10]  Jacek Becla,et al.  Report from the 6th Workshop on Extremely Large Databases , 2013, Data Sci. J..

[11]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[12]  Arie Shoshani,et al.  Optimizing bitmap indices with efficient compression , 2006, TODS.

[13]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[14]  O'NeilPatrick,et al.  Improved query performance with variant indexes , 1997 .

[15]  Xenofontas A. Dimitropoulos,et al.  Indexing million of packets per second using GPUs , 2013, Internet Measurement Conference.

[16]  G. Antoshenkov,et al.  Byte-aligned bitmap compression , 1995, Proceedings DCC '95 Data Compression Conference.

[17]  David R. Cheriton,et al.  HICAMP: architectural support for efficient concurrency-safe shared structured data access , 2012, ASPLOS XVII.

[18]  Frederick Reiss,et al.  Main-memory scan sharing for multi-core CPUs , 2008, Proc. VLDB Endow..

[19]  E IoannidisYannis,et al.  An efficient bitmap encoding scheme for selection queries , 1999 .

[20]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[21]  Anastasia Ailamaki,et al.  MaSM: efficient online updates in data warehouses , 2011, SIGMOD '11.

[22]  Torben Bach Pedersen,et al.  Position list word aligned hybrid: optimizing space and performance for compressed bitmaps , 2010, EDBT '10.

[23]  References , 1971 .

[24]  Kenneth A. Ross,et al.  Implementing database operations using SIMD instructions , 2002, SIGMOD '02.

[25]  Marcin Zukowski,et al.  MonetDB/X100 - A DBMS In The CPU Cache , 2005, IEEE Data Eng. Bull..

[26]  Jacek Becla,et al.  Report from the first Workshop on Extremely Large Databases , 2008, Data Sci. J..

[27]  Alessandro Colantonio,et al.  Concise: Compressed 'n' Composable Integer Set , 2010, Inf. Process. Lett..

[28]  Anastasia Ailamaki,et al.  Designing Access Methods: The RUM Conjecture , 2016, EDBT.

[29]  Prabhat,et al.  FastBit: interactively searching massive data , 2009 .

[30]  Alexander Zeier,et al.  SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units , 2009, Proc. VLDB Endow..

[31]  Guadalupe Canahuate,et al.  A tunable compression framework for bitmap indices , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[32]  Alejandro P. Buchmann,et al.  Encoded bitmap indexing for data warehouses , 1998, Proceedings 14th International Conference on Data Engineering.

[33]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[34]  Anastasia Ailamaki,et al.  Online Updates on Data Warehouses via Judicious Use of Solid-State Storage , 2015, TODS.

[35]  Yannis E. Ioannidis,et al.  An efficient bitmap encoding scheme for selection queries , 1999, SIGMOD '99.

[36]  Bo Wang,et al.  HICAMP bitmap: space-efficient updatable bitmap index for in-memory databases , 2014, DaMoN '14.