Efficient Data Transfer Method for Image Filtering Implementation on FPGA Using OpenCL

Heterogeneous platforms which commonly consist of a central processing unit (CPU) and a graphic processing unit (GPU) receive lots of attention to achieve both high performance and low power consumption. Furthermore, modern heterogeneous platforms often employ a field programmable gate array (FPGA) device in addition to a CPU and a GPU. To fully utilize these heterogeneous hardware accelerators, Open Computing Language (OpenCL) has been developed. In this paper, an FPGA implementation of image filtering with effective data transfer using OpenCL is proposed. To utilize the configurable pipelined architecture of the target FPGA, an effective local memory allocation scheme is proposed for a convolution kernel, and a loopunrolling method is applied to increase the local memory allocation efficiency. By using the proposed method, the average local memory access latency is improved significantly for various memory access patterns. Also, the proposed filtering kernel shows a better performance-per-watt than a functionally equivalent GPU implementation by efficiently utilizing the hardware resources of the target FPGA.