Neural Networks Compression for Language Modeling

In this paper, we consider several compression techniques for the language modeling problem based on recurrent neural networks (RNNs). It is known that conventional RNNs, e.g, LSTM-based networks in language modeling, are characterized with either high space complexity or substantial inference time. This problem is especially crucial for mobile applications, in which the constant interaction with the remote server is inappropriate. By using the Penn Treebank (PTB) dataset we compare pruning, quantization, low-rank factorization, tensor train decomposition for LSTM networks in terms of model size and suitability for fast inference.

[1]  G. RassadinA.,et al.  Deep neural networks performance optimization in image recognition , 2017 .

[2]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[3]  Alexander Novikov,et al.  Ultimate tensorization: compressing convolutional and FC layers alike , 2016, ArXiv.

[4]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[5]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[6]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[7]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[8]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[9]  Vysoké Učení,et al.  Statistical Language Models Based on Neural Networks , 2012 .

[10]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[11]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning , 2016, ArXiv.

[12]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[13]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[14]  Andrey V. Savchenko,et al.  Compressing deep convolutional neural networks in visual emotion recognition , 2017 .

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[18]  Tara N. Sainath,et al.  Learning compact recurrent neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).