Compressing Deep Neural Networks With Sparse Matrix Factorization

Modern deep neural networks (DNNs) are usually overparameterized and composed of a large number of learnable parameters. One of a few effective solutions attempts to compress DNN models via learning sparse weights and connections. In this article, we follow this line of research and present an alternative framework of learning sparse DNNs, with the assistance of matrix factorization. We provide an underlying principle for substituting the original parameter matrices with the multiplications of highly sparse ones, which constitutes the theoretical basis of our method. Experimental results demonstrate that our method substantially outperforms previous states of the arts for compressing various DNNs, giving rich empirical evidence in support of its effectiveness. It is also worth mentioning that, unlike many other works that focus on feedforward networks like multi-layer perceptrons and convolutional neural networks only, we also evaluate our method on a series of recurrent networks in practice.

[1]  Wonyong Sung,et al.  Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[4]  Edwin V. Bonilla,et al.  Improving Topic Coherence with Regularized Topic Models , 2011, NIPS.

[5]  Dmitry P. Vetrov,et al.  Bayesian Sparsification of Recurrent Neural Networks , 2017, ArXiv.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Qionghai Dai,et al.  Cross-Modality Bridging and Knowledge Transferring for Image Understanding , 2019, IEEE Transactions on Multimedia.

[8]  Xiaogang Wang,et al.  Convolutional neural networks with low-rank regularization , 2015, ICLR.

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  I. Shevtsova An improvement of convergence rate estimates in the Lyapunov theorem , 2010 .

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Satoshi Nakamura,et al.  Compressing recurrent neural network with tensor train , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[15]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[16]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[17]  Andrey V. Savchenko,et al.  Neural Networks Compression for Language Modeling , 2017, PReMI.

[18]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[19]  Dacheng Tao,et al.  On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[21]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  Hanqing Lu,et al.  Recent advances in efficient computation of deep convolutional neural networks , 2018, Frontiers of Information Technology & Electronic Engineering.

[24]  Antti Honkela,et al.  Variational learning and bits-back coding: an information-theoretic view to Bayesian learning , 2004, IEEE Transactions on Neural Networks.

[25]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[26]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[27]  Jian Sun,et al.  Efficient and accurate approximations of nonlinear convolutional networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[29]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[30]  Yurong Chen,et al.  Network Sketching: Exploiting Binary Structure in Deep CNNs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[32]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[33]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[34]  Yongdong Zhang,et al.  STAT: Spatial-Temporal Attention Mechanism for Video Captioning , 2020, IEEE Transactions on Multimedia.

[35]  Jian Cheng,et al.  Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Yongdong Zhang,et al.  A Fast Uyghur Text Detector for Complex Background Images , 2018, IEEE Transactions on Multimedia.

[37]  Danilo Comminiello,et al.  Group sparse regularization for deep neural networks , 2016, Neurocomputing.

[38]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[39]  Philip H. S. Torr,et al.  SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  A. C. Berry The accuracy of the Gaussian approximation to the sum of independent variates , 1941 .

[42]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[43]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.