论文信息 - Scalable Multi Corpora Neural Language Models for ASR

Scalable Multi Corpora Neural Language Models for ASR

Neural language models (NLM) have been shown to outperform conventional n-gram language models by a substantial margin in Automatic Speech Recognition (ASR) and other tasks. There are, however, a number of challenges that need to be addressed for an NLM to be used in a practical large-scale ASR system. In this paper, we present solutions to some of the challenges, including training NLM from heterogenous corpora, limiting latency impact and handling personalized bias in the second-pass rescorer. Overall, we show that we can achieve a 6.2% relative WER reduction using neural LM in a second-pass n-best rescoring framework with a minimal increase in latency.

[1] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[2] R Ratcliff,et al. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Ariya Rastrow,et al. Contextual Language Model Adaptation for Conversational Agents , 2018, INTERSPEECH.

[5] Holger Schwenk,et al. Continuous space language models , 2007, Comput. Speech Lang..

[6] Wenlin Chen,et al. Strategies for Training Large Vocabulary Neural Language Models , 2015, ACL.

[7] Ashish Vaswani,et al. Simple, Fast Noise-Contrastive Estimation for Large RNN Vocabularies , 2016, HLT-NAACL.

[8] Dan Klein,et al. When and why are log-linear models self-normalizing? , 2015, NAACL.

[9] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.

[10] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[11] Mark J. F. Gales,et al. Recurrent neural network language model training with noise contrastive estimation for speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12] Sanjeev Khudanpur,et al. Variational approximation of long-span language models for lvcsr , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[15] Richard M. Schwartz,et al. Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[16] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[17] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[18] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[19] Ariya Rastrow,et al. Scalable Language Model Adaptation for Spoken Dialogue Systems , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[20] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[21] Cyril Allauzen,et al. Improved recognition of contact names in voice commands , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22] Oren Melamud,et al. Self-Normalization Properties of Language Modeling , 2018, COLING.

[23] Ashish Vaswani,et al. Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.

[24] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[25] Raghuraman Krishnamoorthi,et al. Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.

[26] Meng Cai,et al. Variance regularization of RNNLM for speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27] Yongqiang Wang,et al. Efficient Training and Evaluation of Recurrent Neural Network Language Models for Automatic Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29] Andrew W. Senior,et al. Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition , 2014, ArXiv.

[30] Yee Whye Teh,et al. A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[31] Rui Yan,et al. How Transferable are Neural Networks in NLP Applications? , 2016, EMNLP.

[32] Jerome R. Bellegarda,et al. Statistical language model adaptation: review and perspectives , 2004, Speech Commun..

[33] Atsushi Nakamura,et al. Real-time one-pass decoding with recurrent neural network language model for speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).