BB-Tree: A practical and efficient main-memory index structure for multidimensional workloads

We present the BB-Tree, a fast and space-efficient index structure for processing multidimensional read/write workloads in main memory. The BB-Tree uses a k-ary search tree for pruning and searching while keeping all data in leaf nodes. It linearizes the inner search tree and manages it in a cache-optimized array, creating the need for occasional re-organizations when data changes. To reduce the frequency of such re-organizations, the BB-Tree introduces a novel architecture for leaf nodes, called bubble buckets, which can automatically morph between different representations depending on their fill degree and are thus able to buffer a large number of insertions or deletions in-place. We compare the BB-Tree to scanning, main-memory variants of the R∗-tree, the kd-tree, and the VA-file, and the recent PHtree using different multidimensional workloads over real and synthetic data sets. The BB-Tree is the fastest access method for range queries up to a selectivity of around 20% (after which it is only beaten by scanning), the fastest method in read/write workloads, and achieves an exact-match query performance similar to that of the best point access method. In addition, it is the most space-efficient method of all considered index structures. We also describe a parallel range query operator and show that it scales with the number of physical cores.

[1]  Yufei Tao,et al.  Theoretically Optimal and Empirically Efficient R-trees with Strong Parallelizability , 2018, Proc. VLDB Endow..

[2]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[3]  Ulf Leser,et al.  Multidimensional range queries on modern hardware , 2018, SSDBM.

[4]  Ramesh Govindan,et al.  Multi-dimensional Range Queries in Sensor Networks - eScholarship , 2003 .

[5]  Ira Assent,et al.  Evaluating Clustering in Subspace Projections of High Dimensional Data , 2009, Proc. VLDB Endow..

[6]  Gunter Saake,et al.  Accelerating Multi-Column Selection Predicates in Main-Memory - The Elf Approach , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[7]  Moira C. Norrie,et al.  The PH-tree: a space-efficient storage structure and multi-dimensional index , 2014, SIGMOD Conference.

[8]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[9]  Beng Chin Ooi,et al.  Lightweight Indexing of Observational Data in Log-Structured Storage , 2014, Proc. VLDB Endow..

[10]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[11]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[12]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[13]  Wolfgang Lehner,et al.  KISS-Tree: smart latch-free in-memory indexing on modern architectures , 2012, DaMoN '12.

[14]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[15]  Pradeep Dubey,et al.  FAST: fast architecture sensitive tree search on modern CPUs and GPUs , 2010, SIGMOD Conference.

[16]  Kihong Kim,et al.  Optimizing multidimensional index trees for main memory access , 2001, SIGMOD '01.

[17]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[18]  Beng Chin Ooi,et al.  Fast and Adaptive Indexing of Multi-Dimensional Observational Data , 2016, Proc. VLDB Endow..

[19]  Sudipta Sengupta,et al.  The Bw-Tree: A B-tree for new hardware platforms , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[20]  Viktor Leis,et al.  The adaptive radix tree: ARTful indexing for main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[21]  Hasso Plattner,et al.  A common database approach for OLTP and OLAP using an in-memory column database , 2009, SIGMOD Conference.

[22]  David Broneske Single Instruction Multiple Data – Not Everything is a Nail for this Hammer , 2017 .

[23]  Ulf Leser,et al.  Cache-Sensitive Skip List: Efficient Range Queries on Modern CPUs , 2016, ADMS/IMDM@VLDB.

[24]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[25]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[26]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[27]  Rong Chen,et al.  Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts , 2016, BMC Bioinformatics.

[28]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[29]  A. Lièvre,et al.  KRAS mutation status is predictive of response to cetuximab therapy in colorectal cancer. , 2006, Cancer research.

[30]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[31]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[32]  Kothuri Venkata Ravi Kanth,et al.  Quadtree and R-tree indexes in oracle spatial: a comparison using GIS data , 2002, SIGMOD '02.