Neural network language models for conversational speech recognition

Recently there is growing interest in using neural networks for language modeling. In contrast to the well known backoff ngram language models (LM), the neural network approach tries to limit problems from the data sparseness by performing the estimation in a continuous space, allowing by these means smooth interpolations. Therefore this type of LM is interesting for tasks for which only a very limited amount of in-domain training data is available, such as the modeling of conversational speech. In this paper we analyze the generalization behavior of the neural network LM for in-domain training corpora varying from 7M to over 21M words. In all cases, significant word error reductions were observed compared to a carefully tuned 4-gram backoff language model in a state of the art conversational speech recognizer for the NIST rich transcription evaluations. We also apply ensemble learning methods and discuss their connections with LM interpolation.

[1]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[2]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[3]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[5]  Jean-Luc Gauvain,et al.  Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Ahmad Emami,et al.  Using a connectionist model in a syntactical based language model , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7]  Jean-Luc Gauvain,et al.  Conversational telephone speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  H. Schwenk,et al.  Efficient training of large neural networks for language modeling , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).