Tensorizing Neural Networks

Deep neural networks currently demonstrate state-of-the-art performance in several domains. At the same time, models of this class are very demanding in terms of computational resources. In particular, a large amount of memory is required by commonly used fully-connected layers, making it hard to use the models on low-end devices and stopping the further increase of the model size. In this paper we convert the dense weight matrices of the fully-connected layers to the Tensor Train [17] format such that the number of parameters is reduced by a huge factor and at the same time the expressive power of the layer is preserved. In particular, for the Very Deep VGG networks [21] we report the compression factor of the dense weight matrix of a fully-connected layer up to 200000 times leading to the compression factor of the whole network up to 7 times.

[1]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[2]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[3]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[4]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[5]  K. Asanovi Experimental Determination of Precision Requirements for Back-propagation Training of Artiicial Neural Networks , 1991 .

[6]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[7]  W. Hackbusch,et al.  A New Scheme for the Tensor Representation , 2009 .

[8]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[9]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[10]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Ebru Arisoy,et al.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Yifan Gong,et al.  Restructuring of deep neural network acoustic models with singular value decomposition , 2013, INTERSPEECH.

[14]  J. Cunningham,et al.  Scaling Multidimensional Inference for Structured Gaussian Processes. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[15]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[16]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[17]  Anton Rodomanov,et al.  Putting MRFs on a Tensor Train , 2014, ICML.

[18]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[19]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[20]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[21]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[22]  Xiu Yang,et al.  Enabling High-Dimensional Hierarchical Uncertainty Quantification by ANOVA and Tensor-Train Decomposition , 2014, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[23]  Le Song,et al.  Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Elad Gilboa,et al.  Scaling Multidimensional Inference for Structured Gaussian Processes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[26]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[27]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[28]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.