论文信息 - An energy-efficient and high-throughput bitwise CNN on sneak-path-free digital ReRAM crossbar

An energy-efficient and high-throughput bitwise CNN on sneak-path-free digital ReRAM crossbar

Convolutional neural network (CNN) based machine learning requires a highly parallel as well as low power consumption (including leakage power) hardware accelerator. In this paper, we will present a digital ReRAM crossbar based CNN accelerator that can achieve significantly higher throughput and lower power consumption than state-of-arts. The CNN is trained with binary constraints on both weights and activations such that all operations become bitwise. With further use of 1-bit comparator, the bitwise CNN model can be naturally realized on a digital ReRAM-crossbar device. A novel sneak-path-free ReRAM-crossbar is further utilized for large-scale realization. Simulation experiments show that the bitwise CNN accelerator on the digital ReRAM crossbar achieves 98.3% and 91.4% accuracy on MNIST and CIFAR-10 benchmarks, respectively. Moreover, it has a peak throughput of 792GOPS at the power consumption of 6.3mW, which is 18.86 times higher throughput and 44.1 times lower power than CMOS CNN (non-binary) accelerators.

[1] Hao Yu,et al. On-line machine learning accelerator on digital RRAM-crossbar , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[2] Luca Benini,et al. YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4] Zhiwei Li,et al. Binary neural network with 16 Mb RRAM macro chip for classification and online training , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[5] Yu Wang,et al. Binary convolutional neural network on RRAM , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[6] Geoffrey E. Hinton,et al. Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[7] Hao Yu,et al. An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar , 2016, ASP-DAC.

[8] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[9] Kate J. Norris,et al. Anatomy of Ag/Hafnia‐Based Selectors with 1010 Nonlinearity , 2017, Advanced materials.

[10] Hao Yu,et al. A Binary Convolutional Encoder-decoder Network for Real-time Natural Scene Text Processing , 2016, ArXiv.

[11] Tao Zhang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[12] O. Richard,et al. 10×10nm2 Hf/HfOx crossbar resistive RAM with excellent performance, reliability and low-energy operation , 2011, 2011 International Electron Devices Meeting.

[13] Rajiv V. Joshi,et al. An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[14] Joel Emer,et al. Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[15] Yiran Chen,et al. A neuromorphic ASIC design using one-selector-one-memristor crossbar , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[16] Catherine Graves,et al. Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[17] J Joshua Yang,et al. Memristive devices for computing. , 2013, Nature nanotechnology.

[18] D. Stewart,et al. The missing memristor found , 2008, Nature.

[19] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[20] Horst Zimmermann,et al. A 65nm CMOS comparator with modified latch to achieve 7GHz/1.3mW at 1.2V and 700MHz/47µW at 0.6V , 2009, 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[21] Narayan Srinivasa,et al. A functional hybrid memristor crossbar-array/CMOS system for data storage and neuromorphic applications. , 2012, Nano letters.

[22] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23] J. Yang,et al. Memristors with diffusive dynamics as synaptic emulators for neuromorphic computing. , 2017, Nature materials.

[24] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[25] V JoshiRajiv,et al. Distributed In-Memory Computing on Binary RRAM Crossbar , 2017 .

[26] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[27] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[28] Song Han,et al. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.