Semi-tied Units for Efficient Gating in LSTM and Highway Networks

Gating is a key technique used for integrating information from multiple sources by long short-term memory (LSTM) models and has recently also been applied to other models such as the highway network. Although gating is powerful, it is rather expensive in terms of both computation and storage as each gating unit uses a separate full weight matrix. This issue can be severe since several gates can be used together in e.g. an LSTM cell. This paper proposes a semi-tied unit (STU) approach to solve this efficiency issue, which uses one shared weight matrix to replace those in all the units in the same layer. The approach is termed "semi-tied" since extra parameters are used to separately scale each of the shared output values. These extra scaling factors are associated with the network activation functions and result in the use of parameterised sigmoid, hyperbolic tangent, and rectified linear unit functions. Speech recognition experiments using British English multi-genre broadcast data showed that using STUs can reduce the calculation and storage cost by a factor of three for highway networks and four for LSTMs, while giving similar word error rates to the original models.

[1]  Andrew W. Senior,et al.  Fast and accurate recurrent neural network acoustic models for speech recognition , 2015, INTERSPEECH.

[2]  Tara N. Sainath,et al.  Highway-LSTM and Recurrent Highway Networks for Speech Recognition , 2017, INTERSPEECH.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[5]  Unfolded recurrent neural networks for speech recognition , 2014, INTERSPEECH.

[6]  Yu Zhang,et al.  Very deep convolutional networks for end-to-end speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Mark J. F. Gales,et al.  Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems , 2016, INTERSPEECH.

[8]  Danilo P. Mandic,et al.  Recurrent neural networks with trainable amplitude of activation functions , 2003, Neural Networks.

[9]  Steve Renals,et al.  Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation , 2016, IEEE ACM Trans. Audio Speech Lang. Process..

[10]  Richard Socher,et al.  Quasi-Recurrent Neural Networks , 2016, ICLR.

[11]  Ekaterina Vylomova,et al.  Depth-Gated LSTM , 2015, ArXiv.

[12]  Mark J. F. Gales,et al.  Cambridge university transcription systems for the multi-genre broadcast challenge , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[13]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Yu Zhang,et al.  Highway long short-term memory RNNS for distant speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Susan Fitt,et al.  On generating combilex pronunciations via morphological analysis , 2010, INTERSPEECH.

[16]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[17]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[18]  Chao Zhang,et al.  A general artificial neural network extension for HTK , 2015, INTERSPEECH.

[19]  Liang Lu,et al.  Small-Footprint Deep Neural Networks with Highway Connections for Speech Recognition , 2015, INTERSPEECH.

[20]  Hermann Ney,et al.  Integrating Gaussian mixtures into deep neural networks: Softmax layer with hidden variables , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Chao Zhang,et al.  Joint training methods for tandem and hybrid speech recognition systems using deep neural networks , 2017 .

[22]  Yifan Gong,et al.  Investigating online low-footprint speaker adaptation using generalized linear regression and click-through data , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Chin-Hui Lee,et al.  Experimental studies on continuous speech recognition using neural architectures with “adaptive” hidden activation functions , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Philip C. Woodland,et al.  High Order Recurrent Neural Networks for Acoustic Modelling , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[27]  Liang Lu,et al.  On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[29]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[30]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[31]  S. M. Siniscalchi,et al.  Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Gunnar Evermann,et al.  Large vocabulary decoding and confidence estimation using word posterior probabilities , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[33]  C. Zhang,et al.  DNN speaker adaptation using parameterised sigmoid and ReLU hidden activation functions , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).