ReOrder: Runtime datapath generation for high-throughput multi-stream processing

Modern Programmable FPGA-based SoCs that tightly couple CPU and programmable logic enable the acceleration of stream processing in hardware on-demand by making use of the available high input and output throughputs and the reconfigurability both in software and hardware. In this paper, we present the concept and implementation of a hardware unit called ReOrder that serves as a converter for multiple parallel streams of data read from and written to an accelerator. Our technique and programmable design allows flexible data access and connects different stream processing accelerators independent of the host data layout. In order to achieve a high accelerator throughput, it is necessary to determine an optimized datapath according to the accelerator's internal schedule of input and output data. We are concerned with an online setting, in which either the data layout (e.g., in the case of modern database systems) or the accelerator operational mode change dynamically. Therefore, an algorithm is required which can be used at “runtime” in order to maintain an optimized datapath configuration. We propose an efficient heuristic algorithm and corresponding FPGA design that is able to translate arbitrary (multi-source) data layouts of the connected host system to generate any specified data stream of the accelerator at runtime within ms.

[1]  James C. Hoe,et al.  Automatic generation of streaming datapaths for arbitrary fixed permutations , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[2]  Sven Groppe,et al.  Automated composition and execution of hardware-accelerated operator graphs , 2015, 2015 10th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC).

[3]  Kizheppatt Vipin,et al.  DyRACT: A partial reconfiguration enabled accelerator and test platform , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[4]  Viktor K. Prasanna,et al.  Automatic generation of high throughput energy efficient streaming architectures for arbitrary fixed permutations , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[5]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[6]  Gustavo Alonso,et al.  Glacier: a query-to-hardware compiler , 2010, SIGMOD Conference.

[7]  Ryan Kastner,et al.  RIFFA 2.0: A reusable integration framework for FPGA accelerators , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[8]  Tom Feist,et al.  Vivado Design Suite , 2012 .

[9]  James C. Hoe,et al.  Permuting streaming data using RAMs , 2009, JACM.

[10]  Jürgen Teich,et al.  FPGA-Based Dynamically Reconfigurable SQL Query Processing , 2016, ACM Trans. Reconfigurable Technol. Syst..

[11]  Kunle Olukotun,et al.  Hardware acceleration of database operations , 2014, FPGA.

[12]  Leland L. Beck,et al.  Smallest-last ordering and clustering and graph coloring algorithms , 1983, JACM.

[13]  Jürgen Teich,et al.  On-the-fly Composition of FPGA-Based SQL Query Accelerators Using a Partially Reconfigurable Module Library , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[14]  Jürgen Teich,et al.  A co-design approach for accelerated SQL query processing via FPGA-based data filtering , 2015, 2015 International Conference on Field Programmable Technology (FPT).