Towards High Performance and Accurate BNN Inference on FPGA with Structured Fine-grained Pruning

As the extreme case of quantization networks, Binary Neural Networks (BNNs) have received tremendous attention due to many hardware-friendly properties in terms of storage and computation. To reach the limit of compact models, we attempt to combine binarization with pruning techniques, further exploring the redundancy of BNNs. However, coarse-grained pruning methods may cause server accuracy drops, while traditional fine-grained ones induce irregular sparsity hard to be utilized by hardware. In this paper, we propose two advanced fine-grained BNN pruning modules, i.e., structured channel-wise kernel pruning and dynamic spatial pruning, from a joint perspective of algorithm and hardware. The pruned BNN models are trained from scratch and present not only a higher precision but also a high degree of parallelism. Then, we develop an accelerator architecture that can effectively exploit the sparsity caused by our algorithm. Finally, we implement the pruned BNN models on an embedded FPGA (Ultra96v2). The results show that our software and hardware codesign achieves 5.4x inference-speedup than the baseline BNN, with higher resource and energy efficiency compared with prior FPGA implemented BNN works.

[1]  Jinmei Lai,et al.  An Efficient Channel-Aware Sparse Binarized Neural Networks Inference Accelerator , 2022, IEEE Transactions on Circuits and Systems - II - Express Briefs.

[2]  S. Agaian,et al.  A comprehensive review of Binary Neural Network , 2021, Artificial Intelligence Review.

[3]  Fuchun Sun,et al.  Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Xiangyu He,et al.  Dynamic Dual Gating Neural Networks , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Jinmei Lai,et al.  An Approach of Binary Neural Network Energy-Efficient Implementation , 2021, Electronics.

[6]  Jinmei Lai,et al.  TCP-Net: Minimizing Operation Counts of Binarized Neural Network Inference , 2021, 2021 IEEE International Symposium on Circuits and Systems (ISCAS).

[7]  Asit K. Mishra,et al.  Accelerating Sparse Deep Neural Networks , 2021, ArXiv.

[8]  Ling Shao,et al.  ReCU: Reviving the Dead Weights in Binary Neural Networks , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Zhijie Zhang,et al.  Learning N: M Fine-grained Structured Sparse Neural Networks From Scratch , 2021, ICLR.

[10]  Wei Wu,et al.  O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference , 2021, IEEE Transactions on Parallel and Distributed Systems.

[11]  Jin Fan,et al.  SBNN: Slimming binarized neural network , 2020, Neurocomputing.

[12]  Kwang-Ting Cheng,et al.  ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions , 2020, ECCV.

[13]  Fengbo Ren,et al.  BNN Pruning: Pruning Binary Neural Network Guided by Weight Flipping Frequency , 2020, 2020 21st International Symposium on Quality Electronic Design (ISQED).

[14]  Puneet Gupta,et al.  3PXNet , 2020, ACM Trans. Embed. Comput. Syst..

[15]  Yanzhi Wang,et al.  PCNN: Pattern-based Fine-Grained Regular Pruning Towards Optimizing CNN Accelerators , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[16]  Tinne Tuytelaars,et al.  Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jingkuan Song,et al.  Forward and Backward Information Retention for Accurate Binary Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jieping Ye,et al.  AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates , 2019, AAAI.

[19]  Tianqi Wang,et al.  LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism , 2019, 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[20]  Diana Marculescu,et al.  Regularizing Activation Distribution for Training Binarized Deep Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jishen Zhao,et al.  Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA , 2018, FPGA.

[22]  Kenneth O'Brien,et al.  FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks , 2018 .

[23]  Wei Liu,et al.  Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.

[24]  Ruizhi Chen,et al.  FBNA: A Fully Binarized Neural Network Accelerator , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[25]  Yu Cao,et al.  Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[26]  Fengbo Ren,et al.  Build a Compact Binary Neural Network through Bit-level Sensitivity and Data Pruning , 2018, Neurocomputing.

[27]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[29]  Rajesh Gupta,et al.  Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.

[30]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[31]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[32]  Frédéric Pétrot,et al.  Ternary neural networks for resource-efficient AI applications , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[33]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[34]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[35]  Yoshua Bengio,et al.  BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[36]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[39]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[40]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Ameya Prabhu,et al.  STQ-Nets: Unifying Network Binarization and Structured Pruning , 2020, BMVC.

[42]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .