Kronecker CP Decomposition with Fast Multiplication for Compressing RNNs

Recurrent neural networks (RNNs) are powerful in the tasks oriented to sequential data, such as natural language processing and video recognition. However, since the modern RNNs, including long-short term memory (LSTM) and gated recurrent unit (GRU) networks, have complex topologies and expensive space/computation complexity, compressing them becomes a hot and promising topic in recent years. Among plenty of compression methods, tensor decomposition, e.g., tensor train (TT), block term (BT), tensor ring (TR) and hierarchical Tucker (HT), appears to be the most amazing approach since a very high compression ratio might be obtained. Nevertheless, none of these tensor decomposition formats can provide both the space and computation efficiency. In this paper, we consider to compress RNNs based on a novel Kronecker CANDECOMP/PARAFAC (KCP) decomposition, which is derived from Kronecker tensor (KT) decomposition, by proposing two fast algorithms of multiplication between the input and the tensor-decomposed weight. According to our experiments based on UCF11, Youtube Celebrities Face and UCF50 datasets, it can be verified that the proposed KCP-RNNs have comparable performance of accuracy with those in other tensor-decomposed formats, and even 278,219x compression ratio could be obtained by the low rank KCP. More importantly, KCP-RNNs are efficient in both space and computation complexity compared with other tensor-decomposed ones under similar ranks. Besides, we find KCP has the best potential for parallel computing to accelerate the calculations in neural networks.

[1]  Cordelia Schmid,et al.  Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Andrzej Cichocki,et al.  On Revealing Replicating Structures in Multiway Data: A Novel Tensor Decomposition Approach , 2012, LVA/ICA.

[3]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[5]  Niraj K. Jha,et al.  Grow and Prune Compact, Fast, and Accurate LSTMs , 2018, IEEE Transactions on Computers.

[6]  Satoshi Nakamura,et al.  Compressing recurrent neural network with tensor train , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[7]  Lieven De Lathauwer,et al.  Decompositions of a Higher-Order Tensor in Block Terms - Part II: Definitions and Uniqueness , 2008, SIAM J. Matrix Anal. Appl..

[8]  Christopher J. Hillar,et al.  Most Tensor Problems Are NP-Hard , 2009, JACM.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Bo Yuan,et al.  Compressing Recurrent Neural Networks Using Hierarchical Tucker Tensor Decomposition , 2020, ArXiv.

[11]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[12]  Andrzej Cichocki,et al.  From basis components to complex structural patterns , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[14]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[15]  Junjun Jiang,et al.  Semisupervised Discriminant Multimanifold Analysis for Action Recognition , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Vladimir Pavlovic,et al.  Face tracking and recognition with visual constraints in real-world videos , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Vittorio Murino,et al.  Efficient pooling of image based CNN features for action recognition in videos , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Ruslan Salakhutdinov,et al.  Action Recognition using Visual Attention , 2015, NIPS 2015.

[19]  Andrzej Cichocki,et al.  Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions , 2016, Found. Trends Mach. Learn..

[20]  Zenglin Xu,et al.  Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Zhen Cui,et al.  Recurrent Regression for Face Recognition , 2016, ArXiv.

[23]  Reinhold Schneider,et al.  Optimization problems in contracted tensor networks , 2011, Comput. Vis. Sci..

[24]  S. V. Dolgov,et al.  ALTERNATING MINIMAL ENERGY METHODS FOR LINEAR SYSTEMS IN HIGHER DIMENSIONS∗ , 2014 .

[25]  Volker Tresp,et al.  Tensor-Train Recurrent Neural Networks for Video Classification , 2017, ICML.

[26]  Lei Deng,et al.  Hybrid Tensor Decomposition in Neural Network Compression , 2020, Neural Networks.

[27]  Masashi Sugiyama,et al.  Learning Efficient Tensor Representations with Ring-structured Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Zhizheng Wu,et al.  Investigating gated recurrent neural networks for speech synthesis , 2016, ArXiv.

[29]  Ivan Oseledets,et al.  Expressive power of recurrent neural networks , 2017, ICLR.

[30]  Yoshua Bengio,et al.  Architectural Complexity Measures of Recurrent Neural Networks , 2016, NIPS.

[31]  Lars Grasedyck,et al.  Hierarchical Singular Value Decomposition of Tensors , 2010, SIAM J. Matrix Anal. Appl..

[32]  Andrzej Cichocki,et al.  Tensor Networks for Dimensionality Reduction, Big Data and Deep Learning , 2018, Advances in Data Analysis with Computational Intelligence Methods.

[33]  Gang Wang,et al.  Multi-manifold deep metric learning for image set classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yuan Xie,et al.  Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey , 2020, Proceedings of the IEEE.

[35]  Jun Yang,et al.  Human action recognition based on multi-mode spatial-temporal feature fusion , 2019, 2019 22th International Conference on Information Fusion (FUSION).

[36]  Andrzej Cichocki,et al.  Tensor Networks for Latent Variable Analysis: Higher Order Canonical Polyadic Decomposition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.