Improving Data Partitioning Performance on OpenCL-Based FPGAs

We investigate the performance of relational database applications on recent OpenCL-based FPGAs. As a start, we study the performance of data partitioning, a core operation widely used in relational databases. Due to the random memory accesses, data partitioning is time-consuming and can become a major bottleneck for database operators such as hash joins. We start with the state-of-the-art OpenCL implementation which was originally designed for the CPU/GPU, and find that such an implementation suffers from lock overhead and memory stalls. To resolve those overheads, we develop a simple yet efficient multi-kernel approach to leverage two emerging features in Alter a OpenCL SDK, namely task kernel and channel. We evaluate the proposed design on a recent Alter a Stratix V GX FPGA. Our results demonstrate that our proposed approach can achieve roughly 10.7X speedup over the state-of-the-art OpenCL implementation.