OctCNN: An Energy-Efficient FPGA Accelerator for CNNs using Octave Convolution Algorithm

Recently, embedded FPGAs have been explored as a potential platform for deploying machine learning on edge-devices due to their high energy efficiency and low cost. However, the lack of resources also makes the deployment of CNN on FPGAs more challenging. In this paper, we present OctCNN, which utilizes the octave convolution (OctConv) algorithm to optimize the FPGA-based CNN accelerator. We first propose a novel architecture for deploying OctConv on FPGAs and then present a resource and performance analysis model to guide a fast design space exploration. As a case study, we implement a classic CNN model, VGG16, on Xilinx ZC702. Results show, compared to the mobile-class CPU and GPU, OctCNN achieves $\mathbf{16.88}\times$ and $\mathbf{2.43}\times$ energy efficiency, respectively. Besides, it has a promising energy efficiency compared to previous FPGA accelerators.

[1]  Xi Chen,et al.  FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[2]  Yu Wang,et al.  Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Shuicheng Yan,et al.  Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[5]  Qi Yu,et al.  DLAU: A Scalable Deep Learning Accelerator Unit on FPGA , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.