LONG SHORT-TERM MEMORY

Model compression is significant for the wide adoption of Recurrent Neural Networks (RNNs) in both user devices possessing limited resources and business clusters requiring quick responses to large-scale service requests. This work aims to learn structurally-sparse Long Short-Term Memory (LSTM) by reducing the sizes of basic structures within LSTM units, including input updates, gates, hidden states, cell states and outputs. Independently reducing the sizes of basic structures can result in inconsistent dimensions among them, and consequently, end up with invalid LSTM units. To overcome the problem, we propose Intrinsic Sparse Structures (ISS) in LSTMs. Removing a component of ISS will simultaneously decrease the sizes of all basic structures by one and thereby always maintain the dimension consistency. By learning ISS within LSTM units, the obtained LSTMs remain regular while having much smaller basic structures. Based on group Lasso regularization, our method achieves 10.59× speedup without losing any perplexity of a language modeling of Penn TreeBank dataset. It is also successfully evaluated through a compact model with only 2.69M weights for machine Question Answering of SQuAD dataset. Our approach is successfully extended to nonLSTM RNNs, like Recurrent Highway Networks (RHNs). Our source code is available1.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[4]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[5]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[6]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[7]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[8]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[9]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[10]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[11]  Hassan Foroosh,et al.  Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Mathieu Salzmann,et al.  Learning the Number of Neurons in Deep Networks , 2016, NIPS.

[14]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[15]  Ian McGraw,et al.  On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Tara N. Sainath,et al.  Learning compact recurrent neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[19]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[20]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[21]  Victor S. Lempitsky,et al.  Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[23]  Erich Elsen,et al.  Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[24]  Sung Ju Hwang,et al.  Combined Group and Exclusive Sparsity for Deep Neural Networks , 2017, ICML.

[25]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[26]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[27]  Yiran Chen,et al.  A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jaime G. Carbonell,et al.  Nonparametric Neural Networks , 2017, ICLR.

[29]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[30]  Cong Xu,et al.  Coordinating Filters for Faster Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Jürgen Schmidhuber,et al.  Recurrent Highway Networks , 2016, ICML.

[32]  Olatunji Ruwase,et al.  HyperDrive: exploring hyperparameters with POP scheduling , 2017, Middleware.

[33]  Max Welling,et al.  Bayesian Compression for Deep Learning , 2017, NIPS.

[34]  Mehryar Mohri,et al.  AdaNet: Adaptive Structural Learning of Artificial Neural Networks , 2016, ICML.

[35]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[36]  Pradeep Dubey,et al.  Faster CNNs with Direct Sparse Convolutions and Guided Pruning , 2016, ICLR.