Crossbar-Aware Neural Network Pruning

Crossbar architecture has been widely adopted in neural network accelerators due to the efficient implementations on vector-matrix multiplication operations. However, in the case of convolutional neural networks (CNNs), the efficiency is compromised dramatically because of the large amounts of data reuse. Although some mapping methods have been designed to achieve a balance between the execution throughput and resource overhead, the resource consumption cost is still huge while maintaining the throughput. Network pruning is a promising and widely studied method to shrink the model size, whereas prior work for CNNs compression rarely considered the crossbar architecture and the corresponding mapping method and cannot be directly utilized by crossbar-based neural network accelerators. This paper proposes a crossbar-aware pruning framework based on a formulated $L_{0}$ -norm constrained optimization problem. Specifically, we design an $L_{0}$ -norm constrained gradient descent with relaxant probabilistic projection to solve this problem. Two types of sparsity are successfully achieved: 1) intuitive crossbar-grain sparsity and 2) column-grain sparsity with output recombination, based on which we further propose an input feature maps reorder method to improve the model accuracy. We evaluate our crossbar-aware pruning framework on the median-scale CIFAR10 data set and the large-scale ImageNet data set with VGG and ResNet models. Our method is able to reduce the crossbar overhead by 44%–72% with insignificant accuracy degradation. This paper significantly reduce the resource overhead and the related energy cost and provides a new co-design solution for mapping CNNs onto various crossbar devices with much better efficiency.

[1]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Lei Deng,et al.  L0 norm constraint based external control source allocation for the minimum cost control of directed networks. , 2018, ISA transactions.

[4]  Pritish Narayanan,et al.  Experimental Demonstration and Tolerancing of a Large-Scale Neural Network (165 000 Synapses) Using Phase-Change Memory as the Synaptic Weight Element , 2014, IEEE Transactions on Electron Devices.

[5]  Saibal Mukhopadhyay,et al.  ReRAM Crossbar based Recurrent Neural Network for human activity detection , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[6]  Kaushik Roy,et al.  TraNNsformer: Neural network transformation for memristive crossbar based neuromorphic system design , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[7]  Lei Deng,et al.  SemiMap: A Semi-Folded Convolution Mapping for Speed-Overhead Balance on Crossbars , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[9]  Yu Wang,et al.  Binary convolutional neural network on RRAM , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[10]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Shaahin Angizi,et al.  Energy Efficient In-Memory Binary Deep Neural Network Accelerator with Dual-Mode SOT-MRAM , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[12]  Bernard Brezzo,et al.  TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Wenguang Chen,et al.  Bridge the Gap between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler , 2017, ASPLOS.

[14]  Jim D. Garside,et al.  SpiNNaker: A 1-W 18-Core System-on-Chip for Massively-Parallel Neural Network Simulation , 2013, IEEE Journal of Solid-State Circuits.

[15]  Dong Wang,et al.  Development of a neuromorphic computing system , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).

[16]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[17]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[18]  Zhuo Wang,et al.  In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array , 2017, IEEE Journal of Solid-State Circuits.

[19]  D. Querlioz,et al.  Visual Pattern Extraction Using Energy-Efficient “2-PCM Synapse” Neuromorphic Architecture , 2012, IEEE Transactions on Electron Devices.

[20]  Chao Wang,et al.  CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[22]  Fang Liu,et al.  Learning Intrinsic Sparse Structures within Long Short-term Memory , 2017, ICLR.

[23]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[24]  Zhenzhi Wu,et al.  GXNOR-Net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework , 2017, Neural Networks.

[25]  Yiran Chen,et al.  PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[26]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  G. W. Burr,et al.  Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memory as the synaptic weight element , 2015, 2014 IEEE International Electron Devices Meeting.

[28]  Qing Wu,et al.  Hardware realization of BSB recall function using memristor crossbar arrays , 2012, DAC Design Automation Conference 2012.

[29]  Yu Wang,et al.  Memristor-based approximated computation , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[30]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[31]  F. Merrikh Bayat,et al.  Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[32]  Dong Wang,et al.  Complex Learning in Bio-plausible Memristive Networks , 2015, Scientific Reports.

[33]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[34]  Hong Wang,et al.  Loihi: A Neuromorphic Manycore Processor with On-Chip Learning , 2018, IEEE Micro.

[35]  Rodrigo Alvarez-Icaza,et al.  A Multicast Tree Router for Multichip Neuromorphic Systems , 2014, IEEE Transactions on Circuits and Systems I: Regular Papers.

[36]  Daniel Brand,et al.  MEC: Memory-efficient Convolution for Deep Neural Network , 2017, ICML.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  E. Vianello,et al.  Variability-tolerant Convolutional Neural Network for Pattern Recognition applications based on OxRAM synapses , 2014, 2014 IEEE International Electron Devices Meeting.

[39]  Pritish Narayanan,et al.  Equivalent-accuracy accelerated neural-network training using analogue memory , 2018, Nature.

[40]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[41]  Scott A. Mahlke,et al.  Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[42]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[43]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[44]  Andrew S. Cassidy,et al.  Convolutional networks for fast, energy-efficient neuromorphic computing , 2016, Proceedings of the National Academy of Sciences.

[45]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .