Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach

Neural networks have become the dominant algorithms rapidly as they achieve state-of-the-art performance in a broad range of applications such as image recognition, speech recognition and natural language processing. However, neural networks keep moving towards deeper and larger architectures, posing a great challenge to the huge amount of data and computations. Although sparsity has emerged as an effective solution for reducing the intensity of computation and memory accesses directly, irregularity caused by sparsity (including sparse synapses and neurons) prevents accelerators from completely leveraging the benefits; it also introduces costly indexing module in accelerators. In this paper, we propose a cooperative software/hardware approach to address the irregularity of sparse neural networks efficiently. Initially, we observe the local convergence, namely larger weights tend to gather into small clusters during training. Based on that key observation, we propose a software-based coarse-grained pruning technique to reduce the irregularity of sparse synapses drastically. The coarse-grained pruning technique, together with local quantization, significantly reduces the size of indexes and improves the network compression ratio. We further design a hardware accelerator, Cambricon-S, to address the remaining irregularity of sparse synapses and neurons efficiently. The novel accelerator features a selector module to filter unnecessary synapses and neurons. Compared with a state-of-the-art sparse neural network accelerator, our accelerator is 1.71× and 1.37× better in terms of performance and energy efficiency, respectively.

[1]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[2]  Srihari Cadambi,et al.  A dynamically configurable coprocessor for convolutional neural networks , 2010, ISCA.

[3]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[4]  Yann LeCun,et al.  Very Deep Convolutional Networks for Natural Language Processing , 2016, ArXiv.

[5]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[6]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[7]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[8]  Geoffrey E. Hinton,et al.  Learning to Label Aerial Images from Noisy Data , 2012, ICML.

[9]  R. Sindhu Reddy,et al.  DLAU: A Scalable Deep Learning Accelerator Unit on FPGA , 2018 .

[10]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[11]  Suvrit Sra,et al.  Diversity Networks , 2015, ICLR.

[12]  Scott A. Mahlke,et al.  Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[13]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[14]  Heiga Zen,et al.  Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[16]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[17]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[18]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[19]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[20]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Kaushik Roy,et al.  AxNN: Energy-efficient neuromorphic systems using approximate computing , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[23]  Song Han,et al.  Exploring the Regularity of Sparse Structure in Convolutional Neural Networks , 2017, ArXiv.

[24]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[25]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[26]  Kang G. Shin,et al.  Real-time dynamic voltage scaling for low-power embedded operating systems , 2001, SOSP.

[27]  Shaoli Liu,et al.  Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[28]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[29]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[30]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[31]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[32]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[33]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[34]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[35]  Victor S. Lempitsky,et al.  Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[37]  Berin Martini,et al.  NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.

[38]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[39]  Rui Peng,et al.  Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures , 2016, ArXiv.

[40]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[41]  Geoffrey E. Hinton,et al.  Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[42]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[43]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[44]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[45]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[46]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[47]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[48]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[49]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[50]  Jenq-Neng Hwang,et al.  Finite Precision Error Analysis of Neural Network Hardware Implementations , 1993, IEEE Trans. Computers.

[51]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[52]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[53]  Scott A. Mahlke,et al.  DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[54]  Natalie D. Enright Jerger,et al.  Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[55]  Kai Yu,et al.  Reshaping deep neural network for fast decoding by node-pruning , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[56]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[57]  Dacheng Tao,et al.  Packing Convolutional Neural Networks in the Frequency Domain , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Iain S. Duff,et al.  An overview of the sparse basic linear algebra subprograms: The new standard from the BLAS technical forum , 2002, TOMS.

[59]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[60]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[61]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  R. Venkatesh Babu,et al.  Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.