Bit-Width Based Resource Partitioning for CNN Acceleration on FPGA

Convolutional neural networks (CNNs) haveachieved great success in many applications. Recently, variousFPGA-based accelerators have been proposed to improve theperformance of CNNs. However, current most FPGA-basedmethods use single bit-width selection for all CNN layers, which lead to very low resource utilization efficiency anddifficulty in further performance improvement. In this paper, we propose a new approach utilizing bit-width partitioning ofFPGA DSP resources to improve the performance andresource utilization efficiency of CNN accelerator. Moreover, we use optimization approach to find the optimal allocationplan for DSP resources. On a Xilinx Virtex-7 FPGA, ourdesign approach achieves performance over the state-of-the-artFPGA-based CNN accelerators from 5.48x to 7.25x and by6.21x on average, when we evaluate the popular CNNs.

[1]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[2]  Michael Ferdman,et al.  Overcoming resource underutilization in spatial CNN accelerators , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).