USING NEURAL NETWORK LANGUAGE MODELS FOR LVCSR

In this paper we describe how to use a neural network language model for the BN and CTS task in the RT04 evaluation. The new approach performs the estimation of the language model probabilities in a continuous space, allowing by this means smooth interpolations. Details are given on training data selection, fast training and decoding algorithms and parameter estimation. The neural network language model achieved word error reductions of 0.5% for the CTS task and of 0.3% for the BN task with an additional decoding cost of 0.05xRT.

[1]  James Demmel,et al.  Using PHiPAC to speed error back-propagation learning , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[3]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[4]  Mary P. Harper,et al.  The SuperARV Language Model: Investigating the Effectiveness of Tightly Integrating Multiple Knowledge Sources , 2002, EMNLP.

[5]  Jean-Luc Gauvain,et al.  Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Blockin Blockin,et al.  Quick Training of Probabilistic Neural Nets by Importance Sampling , 2003 .

[7]  Jean-Luc Gauvain,et al.  Conversational telephone speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  Holger Schwenk,et al.  USING CONTINUOUS SPACE LANGUAGE MODELS FOR CONVERSATIONAL SPEECH RECOGNITION , 2003 .

[9]  Andreas Stolcke,et al.  The use of a linguistically motivated language model in conversational speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  John Makhoul,et al.  THE 2004 BBN/LIMSI 10xRT ENGLISH BROADCAST NEWS TRANSCRIPTION SYSTEM , 2004 .

[11]  Richard M. Schwartz,et al.  The 2004 BBN/LIMSI 20xRT English conversational telephone speech recognition system , 2005, INTERSPEECH.