AUTO-PRUNE: automated DNN pruning and mapping for ReRAM-based accelerator

Emergent ReRAM-based accelerators support in-memory computation to accelerate deep neural network (DNN) inference. Weight matrix pruning of DNNs is a widely used technique to reduce the size of DNN models, thereby reducing the resource and energy consumption of ReRAM-based accelerators. However, conventional works on weight matrix pruning for ReRAM-based accelerators have three major issues. First, they use heuristics or rules from domain experts to prune the weights, leading to suboptimal pruning policies. Second, they mostly focus on improving compression ratio, thus may not meet accuracy constraints. Third, they ignore direct feedback of hardware. In this paper, we introduce an automated DNN pruning and mapping framework, named AUTO-PRUNE. It leverages reinforcement learning (RL) to automatically determine the pruning policy considering the constraint of accuracy loss. The reward function of RL agents is designed using hardware’s direct feedback (i.e., accuracy and compression rate of occupied crossbars). The function directs the search of the pruning ratio of each layer for a global optimum considering the characteristics of individual layers of DNN models. Then AUTO-PRUNE maps the pruned weight matrices to crossbars to store only nontrivial elements. Finally, to avoid the dislocation problem, we design a new data-path in ReRAM-based accelerators to correctly index and feed input to matrix-vector computation leveraging the mechanism of operation units. Experimental results show that, compared to the state-of-the-art work, AUTO-PRUNE achieves up to 3.3X compression rate, 3.1X area efficiency, and 3.3X energy efficiency with a similar or even higher accuracy.

[1]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[2]  Ping Tak Peter Tang,et al.  Enabling Sparse Winograd Convolution by Native Pruning , 2017, ArXiv.

[3]  Sparsh Mittal,et al.  A Survey of ReRAM-Based Architectures for Processing-In-Memory and Neural Networks , 2018, Mach. Learn. Knowl. Extr..

[4]  Ying Wang,et al.  Towards State-Aware Computation in ReRAM Neural Networks , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[5]  Wenguang Chen,et al.  Bridge the Gap between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler , 2017, ASPLOS.

[6]  Weigong Zhang,et al.  Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[7]  Jingyu Wang,et al.  High Area/Energy Efficiency RRAM CNN Accelerator with Kernel-Reordering Weight Mapping Scheme Based on Pattern Pruning , 2020, ArXiv.

[8]  Yanzhi Wang,et al.  PIM-Prune: Fine-Grain DCNN Pruning for Crossbar-Based Process-In-Memory Architecture , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[9]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[10]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[11]  Yiran Chen,et al.  ReRAM-based accelerator for deep learning , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[12]  Harry A. Pierson,et al.  Deep learning in robotics: a review of recent research , 2017, Adv. Robotics.

[13]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[14]  Yanzhi Wang,et al.  An Efficient End-to-End Deep Learning Training Framework via Fine-Grained Pattern-Based Pruning , 2020, ArXiv.

[15]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[16]  Ying Wang,et al.  RaQu: An automatic high-utilization CNN quantization and mapping framework for general-purpose RRAM Accelerator , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[17]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[18]  Yuan Xie,et al.  Learning the sparsity for ReRAM: mapping and pruning sparse neural network for ReRAM based accelerator , 2019, ASP-DAC.

[19]  Zhijian Liu,et al.  HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jingyu Wang,et al.  High PE Utilization CNN Accelerator with Channel Fusion Supporting Pattern-Compressed Sparse Neural Networks , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[21]  Yiran Chen,et al.  ReCom: An efficient resistive accelerator for compressed deep neural networks , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[22]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[23]  Yiran Chen,et al.  PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[24]  Ying Wang,et al.  An Agile Precision-Tunable CNN Accelerator based on ReRAM , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[25]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[26]  Sebastian Grottel,et al.  Visualizations of Deep Neural Networks in Computer Vision: A Survey , 2017 .

[27]  Chia-Lin Yang,et al.  Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[28]  Yuan Xie,et al.  Crossbar-Aware Neural Network Pruning , 2018, IEEE Access.

[29]  Song Han,et al.  A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[32]  Isabelle Guyon,et al.  Taking Human out of Learning Applications: A Survey on Automated Machine Learning , 2018, 1810.13306.

[33]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[34]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[35]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[36]  Yongqiang Lyu,et al.  SNrram: An Efficient Sparse Neural Network Computation Architecture Based on Resistive Random-Access Memory , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[37]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[38]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).