Using OpenCL to rapidly prototype FPGA designs

Field Programmable Gate Arrays (FPGAs) have gained popularity because their reconfigurability can speed up development and verification with relatively low cost. However the deep level of understanding required on hardware logic programming has discouraged many software engineers. An interface between host devices and FPGAs to enable designing and programming FPGAs using a software programming standard and encapsulating hardware details is much desired. In this paper we evaluate leveraging Open Computing Language (OpenCL) to rapidly design FPGAs, considering both hardware logic utilization efficiency and computing performance. On a heterogeneous computer system consisting of ARM processors and Altera FPGA, we execute an OpenCL host program on the ARM processors and an OpenCL kernel on the FPGA, to compute a parametrizable two-dimensional Mandelbrot fractal. We explore three design aspects of adjusting OpenCL work-group size, coalescing memory access, and replicating compute units to improve the FPGA computation performance. After optimizing the core algorithm, we efficiently reduced the logic utilization and Digital Signal Processing (DSP) blocks required for a single compute unit, and successfully increased the number of replicated compute units from four to six, thus delivering a 1.5X increase of parallel computation capacity of the FPGA, and improving the computing speed by 1.5X and memory bandwidth by 1.7X.

[1]  David R. Kaeli,et al.  Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.

[2]  Alan D. George,et al.  Comparative analysis of OpenCL vs. HDL with image-processing kernels on Stratix-V FPGA , 2015, 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[3]  Yao Zhang,et al.  A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[4]  Armando Eduardo De Giusti,et al.  Smith-Waterman Protein Search with OpenCL on an FPGA , 2015, TrustCom 2015.

[5]  Henk Sips,et al.  Quantifying the performance impacts of using local memory for many-core processors , 2013, 2013 IEEE 6th International Workshop on Multi-/Many-core Computing Systems (MuCoCoS).

[6]  Jun Peng,et al.  An Efficient KNN Algorithm Implemented on FPGA Based Heterogeneous Computing System Using OpenCL , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[7]  Xinxin Mei,et al.  Dissecting GPU Memory Hierarchy Through Microbenchmarking , 2015, IEEE Transactions on Parallel and Distributed Systems.

[8]  Pierre-Henri Horrein,et al.  Energy-efficient FPGA implementation for binomial option pricing using OpenCL , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).