Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks

Exploiting model sparsity to reduce ineffectual computation is a commonly used approach to achieve energy efficiency for DNN inference accelerators. However, due to the tightly coupled crossbar structure, exploiting sparsity for ReRAM-based NN accelerator is a less explored area. Existing architectural studies on ReRAM-based NN accelerators assume that an entire crossbar array can be activated in a single cycle. However, due to inference accuracy considerations, matrix-vector computation must be conducted in a smaller granularity in practice, called Operation Unit (OU). An OU-based architecture creates a new opportunity to exploit DNN sparsity. In this paper, we propose the first practical Sparse ReRAM Engine that exploits both weight and activation sparsity. Our evaluation shows that the proposed method is effective in eliminating ineffectual computation, and delivers significant performance improvement and energy savings.

[1]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[2]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[3]  Scott A. Mahlke,et al.  DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[5]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[6]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[8]  Xiaochen Peng,et al.  NeuroSim: A Circuit-Level Macro Model for Benchmarking Neuro-Inspired Architectures in Online Learning , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[9]  Chi-Ying Tsui,et al.  CompRRAE: RRAM-based convolutional neural network accelerator with reduced computations through a runtime activation estimation , 2019, ASP-DAC.

[10]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Meng-Fan Chang,et al.  A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[13]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning , 2016, ArXiv.

[14]  Ping Tak Peter Tang,et al.  Enabling Sparse Winograd Convolution by Native Pruning , 2017, ArXiv.

[15]  Yongqiang Lyu,et al.  SNrram: An Efficient Sparse Neural Network Computation Architecture Based on Resistive Random-Access Memory , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[16]  Yuan Xie,et al.  Crossbar-Aware Neural Network Pruning , 2018, IEEE Access.

[17]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[18]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[19]  H. L. Lung,et al.  A Study of Array Resistance Distribution and a Novel Operation Algorithm for WO x ReRAM Memory , 2015 .

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Tao Zhang,et al.  Overcoming the challenges of crossbar resistive memory architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[22]  Engin Ipek,et al.  Making Memristive Neural Network Accelerators Reliable , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[23]  Meng-Fan Chang,et al.  A 462GOPs/J RRAM-based nonvolatile intelligent processor for energy harvesting IoE system featuring nonvolatile logics and processing-in-memory , 2017, 2017 Symposium on VLSI Technology.

[24]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[25]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[26]  Meng-Fan Chang,et al.  24.1 A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[27]  Mark Sandler,et al.  The Power of Sparsity in Convolutional Neural Networks , 2017, ArXiv.

[28]  Yuan Xie,et al.  Learning the sparsity for ReRAM: mapping and pruning sparse neural network for ReRAM based accelerator , 2019, ASP-DAC.

[29]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[30]  Shaoli Liu,et al.  Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[31]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[32]  Natalie D. Enright Jerger,et al.  Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[33]  Meng-Fan Chang,et al.  DL-RSIM: A Simulation Framework to Enable Reliable ReRAM-based Accelerators for Deep Learning , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[34]  Georg Heigold,et al.  Small-footprint keyword spotting using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[36]  Manoj Alwani,et al.  Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[37]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[39]  Wouter A. Serdijn,et al.  Analysis of Power Consumption and Linearity in Capacitive Digital-to-Analog Converters Used in Successive Approximation ADCs , 2011, IEEE Transactions on Circuits and Systems I: Regular Papers.

[40]  Yiran Chen,et al.  Holistic SparseCNN: Forging the Trident of Accuracy, Speed, and Size , 2016, ArXiv.

[41]  Catherine Graves,et al.  Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[42]  E. Vianello,et al.  HfO2-Based OxRAM Devices as Synapses for Convolutional Neural Networks , 2015, IEEE Transactions on Electron Devices.

[43]  Yu Wang,et al.  Technological Exploration of RRAM Crossbar Array for Matrix-Vector Multiplication , 2015, Journal of Computer Science and Technology.

[44]  I. Guyon,et al.  Handwritten digit recognition: applications of neural network chips and automatic learning , 1989, IEEE Communications Magazine.

[45]  Yiran Chen,et al.  ReCom: An efficient resistive accelerator for compressed deep neural networks , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[46]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[47]  Scott A. Mahlke,et al.  Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[48]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.