Optimising Operator Sets for Analytical Database Processing on FPGAs

The high throughput and partial reconfiguration capabilities of modern FPGAs make them an attractive hardware platform for query processing in analytical database systems using overlay architectures. The design of existing systems is often solely based on hardware characteristics and thus does not account for all requirements of the application. In this paper, we identify two design issues impeding system integration of low-level database operators for runtime-reconfigurable overlay architectures on FPGAs: First, the granularity of operator sets within each processing pipeline; Second, the mapping of query (sub-)graphs to complex hardware operators. We solve these issues by modeling them as variants of the subgraph isomorphism problem. Via optimised operator fusion guided by a heuristic we reduce the number of required reconfigurable regions between 30% and 85% for relevant TPC-H database benchmark queries. This increase in area efficiency is achieved without performance penalties. In 86% of iterations of the operator fusion process, the proposed heuristic finds optimal candidates, which is 3.6\(\times \) more often than for a naive greedy approach.

[1]  Bingsheng He,et al.  OmniDB: Towards Portable and Efficient Query Processing on Parallel CPU/GPU Architectures , 2013, Proc. VLDB Endow..

[2]  Tilmann Rabl,et al.  Generating custom code for efficient query execution on heterogeneous processors , 2017, The VLDB Journal.

[3]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[4]  Gunter Saake,et al.  Cooking DBMS Operations using Granular Primitives , 2018, Datenbank-Spektrum.

[5]  Bingsheng He,et al.  Relational query coprocessing on graphics processors , 2009, TODS.

[6]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[7]  Sven Groppe,et al.  An architectural template for composing application specific datapaths at runtime , 2015, 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[8]  Todd C. Mowry,et al.  Relaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last , 2017, Proc. VLDB Endow..

[9]  Bharat Sukhwani,et al.  Accelerating Join Operation for Relational Databases with FPGAs , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.

[10]  Gunter Saake,et al.  Ocelot/HyPE: Optimized Data Processing on Heterogeneous Hardware , 2014, Proc. VLDB Endow..

[11]  Wei Zhang,et al.  A study of data partitioning on OpenCL-based FPGAs , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[12]  Jens Teubner,et al.  Data Processing on FPGAs , 2013, Proc. VLDB Endow..

[13]  Samuel Madden,et al.  Voodoo - A Vector Algebra for Portable Database Performance on Modern Hardware , 2016, Proc. VLDB Endow..

[14]  Jim Tørresen,et al.  FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting , 2011, FPGA '11.

[15]  Gunter Saake,et al.  Toward Hardware-Sensitive Database Operations , 2014, EDBT.

[16]  Tarek S. Abdelrahman,et al.  A high-performance overlay architecture for pipelined execution of data flow graphs , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[17]  David A. Patterson,et al.  The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V , 2016, ArXiv.

[18]  Jürgen Teich,et al.  On-the-fly Composition of FPGA-Based SQL Query Accelerators Using a Partially Reconfigurable Module Library , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[19]  Jürgen Teich,et al.  FPGA-Based Dynamically Reconfigurable SQL Query Processing , 2016, ACM Trans. Reconfigurable Technol. Syst..

[20]  Jürgen Teich,et al.  A co-design approach for accelerated SQL query processing via FPGA-based data filtering , 2015, 2015 International Conference on Field Programmable Technology (FPT).

[21]  Amir Roth,et al.  RENO: A Rename-Based Instruction Optimizer , 2005, ISCA 2005.

[22]  Seyed H. Roosta Parallel processing and parallel algorithms - theory and computation , 1999 .

[23]  Volker Markl,et al.  Hardware-Oblivious Parallelism for In-Memory Column-Stores , 2013, Proc. VLDB Endow..

[24]  Mohamed Wahib,et al.  Scalable Kernel Fusion for Memory-Bound GPU Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[25]  Thomas Neumann,et al.  Efficiently Compiling Efficient Query Plans for Modern Hardware , 2011, Proc. VLDB Endow..

[26]  Wei Zhang,et al.  Relational query processing on OpenCL-based FPGAs , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).