Rethinking Insertions to B+-Trees on Coupled CPU-GPU Architectures

The B+-tree is a well-known data structure used as an index in database systems, online analytical processing and so on. Leveraging GPUs that provide amounts of computing resources to accelerate indexing is an attractive choice. In this paper, we revisit batch insertions to B+-trees on coupled CPU-GPU architectures to exploit the computing power of co-processors. First, we design a phase-based bulk insertion method applicable for the single CPU and integrated GPU processor. The insertion process is divided into six phases which proceed sequentially. A work group is responsible for inserting keys into a leaf node to reduce irregular memory accesses in the group. Second, we propose a co-processing batch insertion algorithm to utilize the CPU and the integrated GPU simultaneously. Based on the characteristics of processors and tasks, the sorting and calculating tasks are assigned to the CPU while searching and inserting into leaf nodes are performed jointly by two processors. In addition, a pipeline strategy is adopted to overlap sorting the next batch of keys with updating nodes in the current batch. Our experimental study shows that the phase-based insertion method on the CPU and the integrated GPU provides speedups up to 2.55 and 2.27 respectively over PALM, a parallel latch-free B+-tree designed for multi-core processors. The co-processing algorithm further increases the performance by a factor of 33% at most compared with the CPU method and 63% over the GPU method. To the best of our knowledge, this paper is the first effort to consider both the CPU and the integrated GPU to redesign B+-trees insertions.