Energy-Efficient Machine Learning Accelerator for Binary Neural Networks

Binary neural network (BNN) has shown great potential to be implemented with power efficiency and high throughput. Compared with its counterpart, the convolutional neural network (CNN), BNN is trained with binary constrained weights and activations, which are more suitable for edge devices with less computing and storage resource requirements. In this paper, we introduce the BNN characteristics, basic operations and the binarized-network optimization methods. Then we summarize several accelerator designs for BNN hardware implementation by using three mainstream structures, i.e., ReRAM-based crossbar, FPGA and ASIC. Based on the BNN characteristics and hardware custom designs, all these methods achieve massively parallelized computations and highly pipelined data flow to enhance its latency and throughput performance. In addition, the intermediate data with the binary format are stored and processed on chip by constructing the computing-in-memory (CIM) architecture to reduce the off-chip communication costs, including power and latency.

[1]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[2]  Hao Yu,et al.  A Binary Convolutional Encoder-decoder Network for Real-time Natural Scene Text Processing , 2016, ArXiv.

[3]  Axel Jantsch,et al.  Fully digital write-in scheme for multi-bit memristive storage , 2016, 2016 13th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE).

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Chongshen Song,et al.  A Scalable Network-on-Chip Microprocessor With 2.5D Integrated Memory and Accelerator , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[6]  Fengwei An,et al.  A 307-fps 351.7-GOPs/W Deep Learning FPGA Accelerator for Real-Time Scene Text Recognition , 2019, 2019 International Conference on Field-Programmable Technology (ICFPT).

[7]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Rajiv V. Joshi,et al.  An Energy-Efficient Digital ReRAM-Crossbar-Based CNN With Bitwise Parallelism , 2017, IEEE Journal on Exploratory Solid-State Computational Devices and Circuits.

[9]  Yongliang Wang,et al.  A 34-FPS 698-GOP/s/W Binarized Deep Neural Network-Based Natural Scene Text Interpretation Accelerator for Mobile Edge Computing , 2019, IEEE Transactions on Industrial Electronics.

[10]  V JoshiRajiv,et al.  Distributed In-Memory Computing on Binary RRAM Crossbar , 2017 .

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xianglong Liu,et al.  Forward and Backward Information Retention for Accurate Binary Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Axel Jantsch,et al.  Memristors' Potential for Multi-bit Storage and Pattern Learning , 2015, 2015 IEEE European Modelling Symposium (EMS).