We investigate the performance of relational database applications on recent OpenCL-based FPGAs. As a start, we study the performance of data partitioning, a core operation widely used in relational databases. Due to the random memory accesses, data partitioning is time-consuming and can become a major bottleneck for database operators such as hash joins. We start with the state-of-the-art OpenCL implementation which was originally designed for the CPU/GPU, and find that such an implementation suffers from lock overhead and memory stalls. To resolve those overheads, we develop a simple yet efficient multi-kernel approach to leverage two emerging features in Alter a OpenCL SDK, namely task kernel and channel. We evaluate the proposed design on a recent Alter a Stratix V GX FPGA. Our results demonstrate that our proposed approach can achieve roughly 10.7X speedup over the state-of-the-art OpenCL implementation.
[1]
Bingsheng He,et al.
Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture
,
2013,
Proc. VLDB Endow..
[2]
Gustavo Alonso,et al.
Ibex - An Intelligent Storage Engine with Support for Advanced SQL Off-loading
,
2014,
Proc. VLDB Endow..
[3]
Bingsheng He,et al.
In-Cache Query Co-Processing on Coupled CPU-GPU Architectures
,
2014,
Proc. VLDB Endow..
[4]
Kunle Olukotun,et al.
Hardware acceleration of database operations
,
2014,
FPGA.
[5]
Jürgen Teich,et al.
Acceleration of SQL Restrictions and Aggregations through FPGA-Based Dynamic Partial Reconfiguration
,
2013,
2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.