OpenCL library of stream memory components targeting FPGAs

In recent years, high-level languages and compilers, such as OpenCL have improved both productivity and FPGA adoption on a wider scale. One of the challenges in the design of high-performance stream FPGA applications is iterative manual optimization of the numerous application buffers (e.g., arrays, FIFOs and scratch-pads). First, to achieve the desired throughput, the programmer faces the burden of analyzing the memory accesses of each application buffer, and based on observed data locality determines the optimal on-chip buffering, and off-chip read/write data access strategy. Second, to minimize throughput bottlenecks, the programmer has to carefully partition the limited on-chip memory resources among many application buffers. In this work we present an FPGA OpenCL library of pre-optimized stream memory components (SMCs). The library contains three types of SMCs, which implement frequently applied data transformations: 1) stencil, 2) transpose and 3) tiling. The library generates SMCs that are optimized both for the specific data transformation they perform as well as the user specified data set size. Further, to ease the partitioning of on-chip memory resources among many application memories, the library automatically maps application buffers to on-chip and off-chip memory resources. This is achieved by enabling the programmer to specify an on-chip memory budget for each component. In terms of on-chip memory, the SMCs perform data buffering to exploit data locality and maximize reuse. In terms of off-chip memory accesses, the SMCs optimize read/write memory operations by performing data coalescing, bursting and prefetching. We show that using the SMC library, the programmer can quickly generate scalable, pre-optimized stream application memory components, thus reaching throughput targets without time consuming manual memory optimization.

[1]  Implementing FPGA Design with the OpenCL Standard , 2010 .

[2]  HighWire Press Philosophical Transactions of the Royal Society of London , 1781, The London Medical Journal.

[3]  Doris Chen,et al.  Invited paper: Using OpenCL to evaluate the efficiency of CPUS, GPUS and FPGAS for information filtering , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[4]  Jason Cong,et al.  Combining computation and communication optimizations in system synthesis for streaming applications , 2014, FPGA.

[5]  Jasmina Vasiljevic Optimizing an OpenCL Application for Video Watermarking in FPGAs , 2015 .

[6]  Tarek S. Abdelrahman,et al.  hiCUDA: High-Level GPGPU Programming , 2011, IEEE Transactions on Parallel and Distributed Systems.

[7]  Satoru Yamamoto,et al.  Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth , 2014, IEEE Transactions on Parallel and Distributed Systems.

[8]  Roger D. Chamberlain,et al.  Superoptimized Memory Subsystems for Streaming Applications , 2015, FPGA.

[9]  Charles Hutton,et al.  Philosophical Transactions of the Royal Society of London , 1781, The London Medical Journal.

[10]  Jason Cong,et al.  Polyhedral-based data reuse optimization for configurable computing , 2013, FPGA '13.

[11]  James C. Hoe,et al.  CoRAM: an in-fabric memory architecture for FPGA-based computing , 2011, FPGA '11.

[12]  George A. Constantinides,et al.  Optimizing SDRAM bandwidth for custom FPGA loop accelerators , 2012, FPGA '12.

[13]  Kermin Fleming,et al.  Leap scratchpads: automatic memory and cache management for reconfigurable logic , 2010, FPGA '11.

[14]  Frédo Durand,et al.  Decoupling algorithms from schedules for easy optimization of image processing pipelines , 2012, ACM Trans. Graph..

[15]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[16]  Paul Chow,et al.  Using buffer-to-BRAM mapping approaches to trade-off throughput vs. memory use , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[17]  Devadas Varma,et al.  Removing the Barrier for FPGA-Based OpenCL Data Center Servers , 2015 .