Partitioned Blockmap Indexes for Multidimensional Data Access

Given recent increases in the size of main memory in modern machines, it is now common to to store large data sets in RAM for faster processing. Multidimensional access methods aim to provide efficient access to large data sets when queries apply predicates to some of the data dimensions. We examine multidimensional access methods in the context of an in-memory column store tuned for on-line analytical processing or scientific data analysis. We propose a multidimensional data structure that contains a novel combination of a grid array and several bitmaps. The base data is clustered in an order matching that of the index structure. The bitmaps contain one bit per block of data, motivating the term “blockmap.” The proposed data structures are compact, typically taking less than one bit of space per row of data. Partition boundaries can be chosen in a way that reflects both the query workload and the data distribution, and boundaries are not required to evenly divide the data if there is a bias in the query distribution. We examine the theoretical performance of the data structure and experimentally measure its performance on three modern CPUs and one GPU processor. We demonstrate that efficient multidimensional access can be achieved with minimal space overhead.

[1]  Rudolf Bayer,et al.  The Universal B-Tree for Multidimensional Indexing: general Concepts , 1997, WWCA.

[2]  The Vertica ® Analytic Database Technical Overview White Paper a Dbms Architecture Optimized for Next-generation Data Warehousing , .

[3]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[4]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[5]  Kesheng Wu,et al.  Optimizing candidate check costs for bitmap indices , 2005, CIKM '05.

[6]  Jin-Yi Cai,et al.  Circuit minimization problem , 2000, STOC '00.

[7]  Volker Markl,et al.  Improving OLAP performance by multidimensional hierarchical clustering , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[8]  Roger MacNicol,et al.  Sybase IQ Multiplex - Designed For Analytics , 2004, VLDB.

[9]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[10]  Kenneth A. Ross,et al.  Implementing database operations using SIMD instructions , 2002, SIGMOD '02.

[11]  Marianne Winslett,et al.  Multi-resolution bitmap indexes for scientific data , 2007, TODS.

[12]  S. Srinivasa Rao,et al.  Secondary indexing in one dimension: beyond b-trees and bitmap indexes , 2009, PODS.

[13]  Wolfgang Lehner,et al.  Fast integer compression using SIMD instructions , 2010, DaMoN '10.

[14]  Arie Shoshani,et al.  Optimizing bitmap indices with efficient compression , 2006, TODS.

[15]  Daniel J. Abadi,et al.  Performance tradeoffs in read-optimized databases , 2006, VLDB.

[16]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[17]  References , 1971 .

[18]  Kesheng Wu,et al.  Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures , 2009, SSDBM.

[19]  Arie Shoshani,et al.  Breaking the Curse of Cardinality on Bitmap Indexes , 2008, SSDBM.

[20]  Matthew Huras,et al.  Multi-dimensional clustering: a new data layout scheme in DB2 , 2003, SIGMOD '03.