FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations

Binary neural networks (BNNs) have 1-bit weights and activations. Such networks are well suited for FPGAs, as their dominant computations are bitwise arithmetic and the memory requirement is also significantly reduced. However, compared to start-of-the-art compact convolutional neural network (CNN) models, BNNs tend to produce a much lower accuracy on realistic datasets such as ImageNet. In addition, the input layer of BNNs has gradually become a major compute bottleneck, because it is conventionally excluded from binarization to avoid a large accuracy loss. This work proposes FracBNN, which exploits fractional activations to substantially improve the accuracy of BNNs. Specifically, our approach employs a dual-precision activation scheme to compute features with up to two bits, using an additional sparse binary convolution. We further binarize the input layer using a novel thermometer encoding. Overall, FracBNN preserves the key benefits of conventional BNNs, where all convolutional layers are computed in pure binary MAC operations (BMACs). We design an efficient FPGA-based accelerator for our novel BNN model that supports the fractional activations. To evaluate the performance of FracBNN under a resource-constrained scenario, we implement the entire optimized network architecture on an embedded FPGA (Xilinx Ultra96 v2). Our experiments on ImageNet show that FracBNN achieves an accuracy comparable to MobileNetV2, surpassing the best-known BNN design on FPGAs with an increase of 28.9% in top-1 accuracy and a 2.5× reduction in model size. FracBNN also outperforms a recently introduced BNN model with an increase of 2.4% in top1 accuracy while using the same model size. On the embedded FPGA device, FracBNN demonstrates the ability of real-time image classification.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Alexander G. Anderson,et al.  The High-Dimensional Geometry of Binary Neural Networks , 2017, ICLR.

[3]  Maja Pantic,et al.  Improved training of binary networks for human pose estimation and image recognition , 2019, ArXiv.

[4]  Zhiqiang Que,et al.  High-Throughput Convolutional Neural Network on an FPGA by Customized JPEG Compression , 2020, 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[5]  Ruizhi Chen,et al.  FBNA: A Fully Binarized Neural Network Accelerator , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[6]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[7]  Farinaz Koushanfar,et al.  ReBNet: Residual Binarized Neural Network , 2017, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[8]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[9]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[10]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[11]  Zhiru Zhang,et al.  Improving Neural Network Quantization without Retraining using Outlier Channel Splitting , 2019, ICML.

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Swagath Venkataramani,et al.  PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.

[15]  Jing Li,et al.  Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network , 2017, FPGA.

[16]  Li Yang,et al.  A Fully Onchip Binarized Convolutional Neural Network FPGA Impelmentation with Accurate Inference , 2018, ISLPED.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Kenneth O'Brien,et al.  FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks , 2018 .

[19]  Lu Tian,et al.  A High-Performance CNN Processor Based on FPGA for MobileNets , 2019, 2019 29th International Conference on Field Programmable Logic and Applications (FPL).

[20]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[22]  Rajesh Gupta,et al.  Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.

[23]  Christian Peters,et al.  Binary Input Layer: Training of CNN models with binary input data , 2018, ArXiv.

[24]  Ying Chen,et al.  MeliusNet: Can Binary Neural Networks Achieve MobileNet-level Accuracy? , 2020, ArXiv.

[25]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Zhiru Zhang,et al.  Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations , 2020, ICLR.

[27]  Kwang-Ting Cheng,et al.  ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions , 2020, ECCV.

[28]  Wei Liu,et al.  Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.

[29]  Qiuwen Lou,et al.  Design Flow of Accelerating Hybrid Extremely Low Bit-Width Neural Network in Embedded FPGA , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[30]  Yu Cao,et al.  Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.

[31]  Xianglong Liu,et al.  Forward and Backward Information Retention for Accurate Binary Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[33]  Tao Li,et al.  VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization , 2020, IEEE Transactions on Computers.

[34]  Damien Querlioz,et al.  Stochastic Computing for Hardware Implementation of Binarized Neural Networks , 2019, IEEE Access.

[35]  Peter Y. K. Cheung,et al.  LUTNet: Rethinking Inference in FPGA Soft Logic , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[36]  Gang Hua,et al.  How to Train a Compact Binary Neural Network with High Accuracy? , 2017, AAAI.

[37]  Nicu Sebe,et al.  Binary Neural Networks: A Survey , 2020, Pattern Recognit..

[38]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[39]  Jinjun Xiong,et al.  FPGA/DNN Co-Design: An Efficient Design Methodology for 1oT Intelligence on the Edge , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[40]  Colin Raffel,et al.  Thermometer Encoding: One Hot Way To Resist Adversarial Examples , 2018, ICLR.

[41]  Yu Wang,et al.  Software-Hardware Codesign for Efficient Neural Network Acceleration , 2017, IEEE Micro.

[42]  Ian D. Reid,et al.  Towards Effective Low-Bitwidth Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[44]  Xin Dong,et al.  Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[46]  Peng Zhang,et al.  Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[47]  Luciano Lavagno,et al.  Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs , 2018, FPGA.

[48]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Sachin S. Talathi,et al.  Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.

[50]  Daniel Soudry,et al.  Training Binary Multilayer Neural Networks for Image Classification using Expectation Backpropagation , 2015, ArXiv.

[51]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Xianglong Liu,et al.  Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53]  Georgios Tzimiropoulos,et al.  Training Binary Neural Networks with Real-to-Binary Convolutions , 2020, ICLR.

[54]  Xuegong Zhou,et al.  Accelerating low bit-width convolutional neural networks with embedded FPGA , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[55]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[56]  Kai Zhang,et al.  T-DLA: An Open-source Deep Learning Accelerator for Ternarized DNN Models on Embedded FPGA , 2019, 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).