Efficient training of large neural networks for language modeling

Recently there has been increasing interest in using neural networks for language modeling. In contrast to the well-known backoff n-gram language models, the neural network approach tries to limit the data sparseness problem by performing the estimation in a continuous space, allowing by this means smooth interpolations. The complexity to train such a model and to calculate one n-gram probability is however several orders of magnitude higher than for the backoff models, making the new approach difficult to use in real applications. In this paper several techniques are presented that allow the use of a neural network language model in a large vocabulary speech recognition system, in particular very, fast lattice rescoring and efficient training of large neural networks on training corpora of over 10 million words. The described approach achieves significant word error reductions with respect to a carefully tuned 4-gram backoff language model in a state of the art conversational speech recognizer for the DARPA rich transcriptions evaluations.

[1]  James Demmel,et al.  Using PHiPAC to speed error back-propagation learning , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Mari Ostendorf,et al.  Relevance weighting for combining multi-domain data for n-gram language modeling , 1999, Comput. Speech Lang..

[3]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[4]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[5]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[6]  Jean-Luc Gauvain,et al.  Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Ahmad Emami,et al.  Using a connectionist model in a syntactical based language model , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  Blockin Blockin,et al.  Quick Training of Probabilistic Neural Nets by Importance Sampling , 2003 .

[9]  Jean-Luc Gauvain,et al.  Conversational telephone speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10]  Holger Schwenk,et al.  USING CONTINUOUS SPACE LANGUAGE MODELS FOR CONVERSATIONAL SPEECH RECOGNITION , 2003 .