An energy-efficient and high-throughput bitwise CNN on sneak-path-free digital ReRAM crossbar

Convolutional neural network (CNN) based machine learning requires a highly parallel as well as low power consumption (including leakage power) hardware accelerator. In this paper, we will present a digital ReRAM crossbar based CNN accelerator that can achieve significantly higher throughput and lower power consumption than state-of-arts. The CNN is trained with binary constraints on both weights and activations such that all operations become bitwise. With further use of 1-bit comparator, the bitwise CNN model can be naturally realized on a digital ReRAM-crossbar device. A novel sneak-path-free ReRAM-crossbar is further utilized for large-scale realization. Simulation experiments show that the bitwise CNN accelerator on the digital ReRAM crossbar achieves 98.3% and 91.4% accuracy on MNIST and CIFAR-10 benchmarks, respectively. Moreover, it has a peak throughput of 792GOPS at the power consumption of 6.3mW, which is 18.86 times higher throughput and 44.1 times lower power than CMOS CNN (non-binary) accelerators.

[1]  Hao Yu,et al.  On-line machine learning accelerator on digital RRAM-crossbar , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[2]  Luca Benini,et al.  YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4]  Zhiwei Li,et al.  Binary neural network with 16 Mb RRAM macro chip for classification and online training , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[5]  Yu Wang,et al.  Binary convolutional neural network on RRAM , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[6]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[7]  Hao Yu,et al.  An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar , 2016, ASP-DAC.

[8]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[9]  Kate J. Norris,et al.  Anatomy of Ag/Hafnia‐Based Selectors with 1010 Nonlinearity , 2017, Advanced materials.

[10]  Hao Yu,et al.  A Binary Convolutional Encoder-decoder Network for Real-time Natural Scene Text Processing , 2016, ArXiv.

[11]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[12]  O. Richard,et al.  10×10nm2 Hf/HfOx crossbar resistive RAM with excellent performance, reliability and low-energy operation , 2011, 2011 International Electron Devices Meeting.

[13]  Rajiv V. Joshi,et al.  An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[14]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[15]  Yiran Chen,et al.  A neuromorphic ASIC design using one-selector-one-memristor crossbar , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[16]  Catherine Graves,et al.  Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[17]  J Joshua Yang,et al.  Memristive devices for computing. , 2013, Nature nanotechnology.

[18]  D. Stewart,et al.  The missing memristor found , 2008, Nature.

[19]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[20]  Horst Zimmermann,et al.  A 65nm CMOS comparator with modified latch to achieve 7GHz/1.3mW at 1.2V and 700MHz/47µW at 0.6V , 2009, 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[21]  Narayan Srinivasa,et al.  A functional hybrid memristor crossbar-array/CMOS system for data storage and neuromorphic applications. , 2012, Nano letters.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  J. Yang,et al.  Memristors with diffusive dynamics as synaptic emulators for neuromorphic computing. , 2017, Nature materials.

[24]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[25]  V JoshiRajiv,et al.  Distributed In-Memory Computing on Binary RRAM Crossbar , 2017 .

[26]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[27]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[28]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.