Accelerating Inference on Binary Neural Networks with Digital RRAM Processing

The need for efficient Convolutional Neural Network (CNNs) targeting embedded systems led to the popularization of Binary Neural Networks (BNNs), which significantly reduce execution time and memory requirements by representing the operands using only one bit. Also, due to 90% of the operations executed by CNNs and BNNs being convolutions, a quest for custom accelerators to optimize the convolution operation and reduce data movements has started, in which Resistive Random Access Memory (RRAM)-based accelerators have proven to be of interest. This work presents a custom Binary Dot Product Engine(BDPE) for BNNs that exploits the low-level compute capabilities enabled RRAMs. This new engine allows accelerating the execution of the inference phase of BNNs by locally storing the most used kernels and performing the binary convolutions using RRAM devices and optimized custom circuitry. Results show that the novel BDPE improves performance by 11.3%, energy efficiency by 7.4% and reduces the number of memory accesses by 10.7% at a cost of less than 0.3% additional die area.

[1]  Xiaochen Peng,et al.  XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[2]  Jihyuck Jo,et al.  Energy-Efficient Convolution Architecture Based on Rescheduled Dataflow , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[3]  Chi-Ying Tsui,et al.  A high-throughput and energy-efficient RRAM-based convolutional neural network using data encoding and dynamic quantization , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[4]  Yu Wang,et al.  Binary convolutional neural network on RRAM , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[5]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[6]  Jun-Seok Park,et al.  14.6 A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[7]  Giovanni De Micheli,et al.  Circuit Designs of High-Performance and Low-Power RRAM-Based Multiplexers Based on 4T(ransistor)1R(RAM) Programming Structure , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[8]  Pierre-Emmanuel Gaillardon,et al.  A Robust Digital RRAM-Based Convolutional Block for Low-Power Image Processing and Learning Applications , 2019, IEEE Transactions on Circuits and Systems I: Regular Papers.

[9]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Francky Catthoor,et al.  Quantification of Sense Amplifier Offset Voltage Degradation due to Zero-and Run-Time Variability , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[11]  Yen-Cheng Kuan,et al.  A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[12]  Shimeng Yu,et al.  Metal–Oxide RRAM , 2012, Proceedings of the IEEE.

[13]  Thierry Moreau,et al.  Energy-Efficient Neural Network Acceleration in the Presence of Bit-Level Memory Errors , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[14]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[15]  Yu Wang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[16]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[17]  David Atienza,et al.  Gem5-X: A Gem5-Based System Level Simulation Framework to Optimize Many-Core Platforms , 2019, 2019 Spring Simulation Conference (SpringSim).

[18]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[19]  Luca Benini,et al.  Energy proportionality in near-threshold computing servers and cloud data centers: Consolidating or Not? , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[20]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[21]  Gilles Sassatelli,et al.  Accuracy evaluation of GEM5 simulator system , 2012, 7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC).

[22]  Jason Cong,et al.  Minimizing Computation in Convolutional Neural Networks , 2014, ICANN.

[23]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[24]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[25]  Rajiv V. Joshi,et al.  An Energy-Efficient Digital ReRAM-Crossbar-Based CNN With Bitwise Parallelism , 2017, IEEE Journal on Exploratory Solid-State Computational Devices and Circuits.

[26]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[28]  Pierre-Emmanuel Gaillardon,et al.  A Product Engine for Energy-Efficient Execution of Binary Neural Networks Using Resistive Memories , 2019, 2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC).

[29]  An Chen,et al.  Variability of resistive switching memories and its impact on crossbar array performance , 2011, 2011 International Reliability Physics Symposium.

[30]  Sylvain Clerc,et al.  30% static power improvement on ARM Cortex®-A53 using static biasing-anticipation , 2016, ESSCIRC Conference 2016: 42nd European Solid-State Circuits Conference.

[31]  Yu Wang,et al.  Switched by input: Power efficient structure for RRAM-based convolutional neural network , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).