Accelerating Recurrent Neural Network Language Model Based Online Speech Recognition System

This paper presents methods to accelerate recurrent neural network based language models (RNNLMs) for online speech recognition systems. Firstly, a lossy compression of the past hidden layer outputs (history vector) with caching is introduced in order to reduce the number of LM queries. Next, RNNLM computations are deployed in a CPU-GPU hybrid manner, which computes each layer of the model on a more advantageous platform. The added overhead by data exchanges between CPU and GPU is compensated through a frame-wise batching strategy. The performance of the proposed methods evaluated on LibriSpeech 1 test sets indicates that the reduction in history vector precision improves the average recognition speed by 1.23 times with minimum degradation in accuracy. On the other hand, the CPU-GPU hybrid parallelization enables RNNLM based real-time recognition with a four times improvement in speed.

[1]  Chiyoun Park,et al.  Applying GPGPU to recurrent neural network language model based fast network search in the real-time LVCSR , 2015, INTERSPEECH.

[2]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[5]  Yongqiang Wang,et al.  Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch , 2014, INTERSPEECH.

[6]  Yongqiang Wang,et al.  Two Efficient Lattice Rescoring Methods Using Recurrent Neural Network Language Models , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Tanel Alumäe,et al.  Multi-Domain Recurrent Neural Network Language Model for Medical Speech Recognition , 2014, Baltic HLT.

[8]  Atsushi Nakamura,et al.  Real-time one-pass decoding with recurrent neural network language model for speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Yonghong Yan,et al.  Prefix tree based n-best list re-scoring for recurrent neural network language model used in speech recognition system , 2013, INTERSPEECH.

[10]  Ebru Arisoy,et al.  Converting Neural Network Language Models into back-off language models for efficient decoding in automatic speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Lukás Burget,et al.  Recurrent Neural Network Based Language Modeling in Meeting Recognition , 2011, INTERSPEECH.

[12]  Petr Motlícek,et al.  Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition , 2012, INTERSPEECH.

[13]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[14]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[15]  Mark J. F. Gales,et al.  Recurrent neural network language model training with noise contrastive estimation for speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Geoffrey Zweig,et al.  Cache based recurrent neural network language model inference for first pass speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Lukás Burget,et al.  Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[18]  Ian R. Lane,et al.  Efficient On-The-Fly Hypothesis Rescoring in a Hybrid GPU/CPU-based Large Vocabulary Continuous Speech Recognition Engine , 2012, INTERSPEECH.