Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks

The usage of Deep Neural Networks (DNN) on resource-constrained edge devices has been limited due to their high computation and large memory requirement. In this work, we propose an algorithm to compress DNNs by jointly optimizing structured sparsity and quantization constraints in a single DNN training framework. The proposed algorithm has been extensively validated on high/low capacity DNNs and wide/deep sparse DNNs. Further, we perform Pareto-optimal analysis to extract optimal DNN models from a large set of trained DNN models. The optimal structurally-compressed DNN model achieves ~50X weight memory reduction without test accuracy degradation, compared to floating-point uncompressed DNN.

[1]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[2]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[3]  Yu Cao,et al.  An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[4]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[7]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[8]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[9]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[10]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[11]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[13]  Chaitali Chakrabarti,et al.  Minimizing area and energy of deep learning hardware design using collective low precision and structured compression , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Colin Raffel,et al.  Lasagne: First release. , 2015 .

[16]  Eriko Nurvitadhi,et al.  WRPN: Wide Reduced-Precision Networks , 2017, ICLR.

[17]  Chaitali Chakrabarti,et al.  Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[18]  Parul Parashar,et al.  Neural Networks in Machine Learning , 2014 .

[19]  Yiran Chen,et al.  Low-power neuromorphic speech recognition engine with coarse-grain sparsity , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[20]  Andreas Stolcke,et al.  The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[23]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Yaochu Jin,et al.  Pareto-based Multi-Objective Machine Learning , 2007, 7th International Conference on Hybrid Intelligent Systems (HIS 2007).

[26]  Diana Marculescu,et al.  LightNN: Filling the Gap between Conventional Deep Neural Networks and Binarized Networks , 2017, ACM Great Lakes Symposium on VLSI.

[27]  Yaochu Jin Pareto-optimality is everywhere: From engineering design, machine learning, to biological systems , 2008, GEFS.

[28]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[29]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[30]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[31]  Scott A. Mahlke,et al.  Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).