A Software-Hardware collaboration system for CNN algorithms based on FPGA
暂无分享,去创建一个
In this paper, a SoC system with ARM processor and convolution accelerator is designed for CNN algorithms on the ZC706 evaluation board. Using tiling technology and loop reorganization, the system has a high data reuse rate, thus greatly reducing the data bandwidth between the on-chip buffer and DDR memory. This convolution accelerator supports different kernel size from $1 \times 1$ to $11 \times 11$, while the activation functions supported are ReLU and Leaky ReLU. The processor of the SoC is mainly responsible for controlling and processing other computations of the CNN, such as LRN and pooling, which makes the system more versatile and flexible. At the working frequency of 100MHz, the peak performance can reach 45.16 GFLOPS, which is 142.8x faster than Cortex-A9 and the energy efficiency is 219.5x better compared to i7-4790K.
[1] Yu Cao,et al. Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks , 2017, FPGA.
[2] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[3] Leibo Liu,et al. Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.