A Flexible Yet Efficient DNN Pruning Approach for Crossbar-Based Processing-in-Memory Architectures

Pruning deep neural networks (DNNs) can reduce the model size and thus save hardware resources of a resistive-random-access-memory (ReRAM)-based DNN accelerator. For the tightly coupled crossbar structure, existing ReRAM-based pruning techniques prune the weights of a DNN in a structured manner, thereby attaining low pruning ratios. This article presents a novel pruning technique, SegPrune, for pruning the weights of a DNN flexibly on crossbar architectures in order to maximize the pruning ratio achieved while preserving crossbar efficiency. We observe that different filters of a weight matrix share a large number of matrix subcolumns (in the same rows), called segments, that can be pruned by using the same segment shape in the sense that the weights at the same column position of these segments are either simultaneously accuracy-sensitive (and should thus be reserved) or simultaneously accuracy-insensitive (and can thus be pruned). Due to the bit-line exchangeability in the crossbar, segments with the same pruning shape can be assembled together into the same crossbar to ensure crossbar execution efficiency. We propose a projection-based shape voting algorithm to select suitable segment shapes to drive the weight pruning process. Accordingly, we also introduce a low-overhead data path that can be easily integrated into any existing ReRAM-based DNN accelerator, achieving a high pruning ratio and a high execution efficiency. Our evaluation shows that SegPrune outperforms the state-of-the-art, Hybrid-P, and FORMAS, by up to <inline-formula> <tex-math notation="LaTeX">$14.6\times $ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$3.6\times $ </tex-math></inline-formula> in pruning ratio, <inline-formula> <tex-math notation="LaTeX">$13.9\times $ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$3.4\times $ </tex-math></inline-formula> in inference speedup, and <inline-formula> <tex-math notation="LaTeX">$12.5\times $ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$3.1\times $ </tex-math></inline-formula> in energy reduction, respectively, while achieving an even higher accuracy at the cost of less than 0.27% extra hardware area overhead.

[1]  Hsiang-Yun Cheng,et al.  RePIM: Joint Exploitation of Activation and Weight Repetitions for In-ReRAM DNN Acceleration , 2021, 2021 58th ACM/IEEE Design Automation Conference (DAC).

[2]  Xian-He Sun,et al.  AUTO-PRUNE: automated DNN pruning and mapping for ReRAM-based accelerator , 2021, ICS.

[3]  Hang Liu,et al.  FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[4]  Yuchao Yang,et al.  NAS4RRAM: neural network architecture search for inference on RRAM-based accelerators , 2021, Science China Information Sciences.

[5]  Xiaochen Peng,et al.  Structured Pruning of RRAM Crossbars for Efficient In-Memory Computing Acceleration of Deep Neural Networks , 2021, IEEE Transactions on Circuits and Systems II: Express Briefs.

[6]  Xiaolong Ma,et al.  TinyADC: Peripheral Circuit-aware Weight Pruning Framework for Mixed-signal DNN Accelerators , 2021, 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[7]  Jingyu Wang,et al.  High Area/Energy Efficiency RRAM CNN Accelerator with Kernel-Reordering Weight Mapping Scheme Based on Pattern Pruning , 2020, ArXiv.

[8]  Yuhao Zhang,et al.  PattPIM: A Practical ReRAM-Based DNN Accelerator by Reusing Weight Pattern Repetitions , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[9]  Yanzhi Wang,et al.  PIM-Prune: Fine-Grain DCNN Pruning for Crossbar-Based Process-In-Memory Architecture , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[10]  Yanzhi Wang,et al.  BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method , 2020, ArXiv.

[11]  Yanzhi Wang,et al.  PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning , 2020, ASPLOS.

[12]  Wei Tang,et al.  CASCADE: Connecting RRAMs to Extend Analog Dataflow In An End-To-End In-Memory Processing Paradigm , 2019, MICRO.

[13]  Yanzhi Wang,et al.  Tiny but Accurate: A Pruned, Quantized and Optimized Memristor Crossbar Framework for Ultra Efficient DNN Implementation , 2019, 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC).

[14]  Jieping Ye,et al.  AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates , 2019, AAAI.

[15]  Yanzhi Wang,et al.  Non-Structured DNN Weight Pruning—Is It Beneficial in Any Platform? , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Chia-Lin Yang,et al.  Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[17]  Yuan Xie,et al.  Learning the sparsity for ReRAM: mapping and pruning sparse neural network for ReRAM based accelerator , 2019, ASP-DAC.

[18]  Jiayu Li,et al.  ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers , 2018, ASPLOS.

[19]  Jing Liu,et al.  Discrimination-aware Channel Pruning for Deep Neural Networks , 2018, NeurIPS.

[20]  Houqiang Li,et al.  Improving Deep Neural Network Sparsity through Decorrelation Regularization , 2018, IJCAI.

[21]  Yongqiang Lyu,et al.  SNrram: An Efficient Sparse Neural Network Computation Architecture Based on Resistive Random-Access Memory , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[22]  Yiran Chen,et al.  ReCom: An efficient resistive accelerator for compressed deep neural networks , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[23]  Yanzhi Wang,et al.  Systematic Weight Pruning of DNNs using Alternating Direction Method of Multipliers , 2018, ICLR.

[24]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[27]  Mark Sandler,et al.  The Power of Sparsity in Convolutional Neural Networks , 2017, ArXiv.

[28]  Yiran Chen,et al.  PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[29]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[30]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[31]  Yu Wang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[32]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[33]  H. L. Lung,et al.  A Study of Array Resistance Distribution and a Novel Operation Algorithm for WO x ReRAM Memory , 2015 .

[34]  Victor S. Lempitsky,et al.  Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Zhi-Quan Luo,et al.  Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2014, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Georg Heigold,et al.  Small-footprint keyword spotting using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[40]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[41]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[42]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .