An Approach of Binary Neural Network Energy-Efficient Implementation

Binarized neural networks (BNNs), which have 1-bit weights and activations, are well suited for FPGA accelerators as their dominant computations are bitwise arithmetic, and the reduction in memory requirements means that all the network parameters can be stored in internal memory. However, the energy efficiency of these accelerators is still restricted by the abundant redundancies in BNNs. This hinders their deployment for applications in smart sensors and tiny devices because these scenarios have tight constraints with respect to energy consumption. To overcome this problem, we propose an approach to implement BNN inference while offering excellent energy efficiency for the accelerators by means of pruning the massive redundant operations while maintaining the original accuracy of the networks. Firstly, inspired by the observation that the convolution processes of two related kernels contain many repeated computations, we first build one formula to clarify the reusing relationships between their convolutional outputs and remove the unnecessary operations. Furthermore, by generalizing this reusing relationship to one tile of kernels in one neuron, we adopt an inclusion pruning strategy to further skip the superfluous evaluations of the neurons whose real output values can be determined early. Finally, we evaluate our system on the Zynq 7000 XC7Z100 FPGA platform. Our design can prune 51 percent of the operations without any accuracy loss. Meanwhile, the energy efficiency of our system is as high as 6.55 × 105 Img/kJ, which is 118× better than the best accelerator based on an NVDIA Tesla-V100 GPU and 3.6× higher than the state-of-the-art FPGA implementations for BNNs.

[1]  Hiroki Nakahara,et al.  A Threshold Neuron Pruning for a Binarized Deep Neural Network on an FPGA , 2018, IEICE Trans. Inf. Syst..

[2]  Peter Y. K. Cheung,et al.  LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference , 2020, IEEE Transactions on Computers.

[3]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[4]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[5]  Wei Wu,et al.  O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference , 2021, IEEE Transactions on Parallel and Distributed Systems.

[6]  Lee-Sup Kim,et al.  NAND-Net: Minimizing Computational Complexity of In-Memory Processing for Binary Neural Networks , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[7]  Olivier Romain,et al.  Radar Signal Processing for Sensing in Assisted Living: The challenges associated with real-time implementation of emerging algorithms , 2019, IEEE Signal Processing Magazine.

[8]  Wayne Luk,et al.  FP-BNN: Binarized neural network on FPGA , 2018, Neurocomputing.

[9]  Ying Chen,et al.  MeliusNet: Can Binary Neural Networks Achieve MobileNet-level Accuracy? , 2020, ArXiv.

[10]  Ang Li,et al.  Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs , 2021, IEEE Transactions on Parallel and Distributed Systems.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Bohan Zhuang,et al.  Automatic Pruning for Quantized Neural Networks , 2020, 2021 Digital Image Computing: Techniques and Applications (DICTA).