Wide Compression: Tensor Ring Nets

Deep neural networks have demonstrated state-of-the-art performance in a variety of real-world applications. In order to obtain performance gains, these networks have grown larger and deeper, containing millions or even billions of parameters and over a thousand layers. The tradeoff is that these large architectures require an enormous amount of memory, storage, and computation, thus limiting their usability. Inspired by the recent tensor ring factorization, we introduce Tensor Ring Networks (TR-Nets), which significantly compress both the fully connected layers and the convolutional layers of deep neural networks. Our results show that our TR-Nets approach is able to compress LeNet-5 by 11× without losing accuracy, and can compress the state-of-the-art Wide ResNet by 243× with only 2.3% degradation in Cifar10 image classification. Overall, this compression scheme shows promise in scientific computing and deep learning, especially for emerging resource-constrained devices such as smartphones, wearables, and IoT devices.

[1]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[2]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Anima Anandkumar,et al.  Tensor Regression Networks , 2017, J. Mach. Learn. Res..

[6]  Shih-Fu Chang,et al.  An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Masashi Sugiyama,et al.  Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 2 Applications and Future Perspectives , 2017, Found. Trends Mach. Learn..

[8]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[9]  V. Aggarwal,et al.  Efficient Low Rank Tensor Ring Completion , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Guillermo Sapiro,et al.  Generalization error of deep neural networks: Role of classification margin and data structure , 2017, 2017 International Conference on Sampling Theory and Applications (SampTA).

[11]  Benjamin Schrauwen,et al.  Deep content-based music recommendation , 2013, NIPS.

[12]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[13]  Vaneet Aggarwal,et al.  Tensor Completion by Alternating Minimization under the Tensor Train (TT) Model , 2016, ArXiv.

[14]  Xiaodong Wang,et al.  Deterministic and Probabilistic Conditions for Finite Completability of Low-Tucker-Rank Tensor , 2016, IEEE Transactions on Information Theory.

[15]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Amnon Shashua,et al.  Convolutional Rectifier Networks as Generalized Tensor Decompositions , 2016, ICML.

[17]  Volker Tresp,et al.  Tensor-Train Recurrent Neural Networks for Video Classification , 2017, ICML.

[18]  Minh N. Do,et al.  Efficient tensor completion: Low-rank tensor train , 2016, ArXiv.

[19]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[20]  Dit-Yan Yeung,et al.  Tensor Embedding Methods , 2006, AAAI.

[21]  Zhe Gan,et al.  Topic Compositional Neural Language Model , 2017, AISTATS.

[22]  Eunhyeok Park,et al.  Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Amnon Shashua,et al.  Deep SimNets , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Andrzej Cichocki,et al.  Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions , 2016, Found. Trends Mach. Learn..

[26]  F. L. Hitchcock The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .

[27]  Lina Yao,et al.  Deep Learning Based Recommender System , 2017, ACM Comput. Surv..

[28]  Nicholas D. Lane,et al.  An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices , 2015, IoT-App@SenSys.

[29]  Nadav Cohen,et al.  On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.

[30]  Xiaodong Wang,et al.  Rank Determination for Low-Rank Data Completion , 2017, J. Mach. Learn. Res..

[31]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[32]  Xiaodong Wang,et al.  An approximation of the CP-rank of a partially sampled tensor , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[33]  Xiaodong Wang,et al.  A characterization of sampling patterns for low-tucker-rank tensor completion problem , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[34]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[35]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[36]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[37]  Liqing Zhang,et al.  Tensor Ring Decomposition , 2016, ArXiv.

[38]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[39]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[40]  Shih-Fu Chang,et al.  Fast Neural Networks with Circulant Projections , 2015, ArXiv.

[41]  Vaneet Aggarwal,et al.  Tensor Train Neighborhood Preserving Embedding , 2017, IEEE Transactions on Signal Processing.

[42]  Alexander Novikov,et al.  Ultimate tensorization: compressing convolutional and FC layers alike , 2016, ArXiv.

[43]  Roman Orus,et al.  A Practical Introduction to Tensor Networks: Matrix Product States and Projected Entangled Pair States , 2013, 1306.2164.

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[46]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[47]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[48]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[49]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[50]  Lawrence Carin,et al.  Earliness-Aware Deep Convolutional Networks for Early Time Series Classification , 2016, ArXiv.

[51]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[52]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[53]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[54]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[55]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.