PattPIM: A Practical ReRAM-Based DNN Accelerator by Reusing Weight Pattern Repetitions

Weight sparsity has been explored to achieve energy efficiency for Resistive Random-access Memory (ReRAM) based DNN accelerators. However, most existing ReRAM-based DNN accelerators are based on an overidealized crossbar architecture and mainly focus on compressing zero weights. In this paper, we propose a novel ReRAM-based accelerator — PattPIM, to achieve space compression and computation reuse by studying DNN weight patterns based on practical ReRAM crossbars. We first thoroughly analyze the weight distribution characteristics of several typical DNN models and observe many non-zero weight pattern repetitions (WPRs). Thus, in PattPIM, we propose a WPR-aware DNN engine and a WPR-to-OU mapping scheme to save both space and computation resources. Furthermore, we adopt an approximate weight pattern transform algorithm to improve the DNN WPRs ratio to enhance the reuse efficiency with negligible inference accuracy loss. Our evaluation with 6 DNN models shows that the proposed PattPIM delivers significant performance improvement, ReRAM resources efficiency and energy saving.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[3]  Zili Shao,et al.  Re-Tangle: A ReRAM-based Processing-in-Memory Architecture for Transaction-based Blockchain , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[4]  Yongqiang Lyu,et al.  SNrram: An Efficient Sparse Neural Network Computation Architecture Based on Resistive Random-Access Memory , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[5]  Duo Liu,et al.  A Novel ReRAM-Based Processing-in-Memory Architecture for Graph Traversal , 2018, ACM Trans. Storage.

[6]  Yu Wang,et al.  MNSIM: Simulation Platform for Memristor-Based Neuromorphic Computing System , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Yiran Chen,et al.  PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[8]  Mengjia Yan,et al.  UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[9]  Yuan Xie,et al.  Learning the sparsity for ReRAM: mapping and pruning sparse neural network for ReRAM based accelerator , 2019, ASP-DAC.

[10]  Meng-Fan Chang,et al.  A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[11]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[12]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[13]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[14]  Chia-Lin Yang,et al.  Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[15]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).