Developments in continuous speech dictation using the 1995 ARPA NAB news task

We report on the LIMSI recognizer evaluated in the ARPA 1995 North American Business (NAB) news benchmark test. In contrast to previous evaluations, the new Hub 3 test aims at improving basic SI, CSR performance on unlimited-vocabulary read speech recorded under more varied acoustical conditions (background environmental noise and unknown microphones). The LIMSI recognizer is an HMM-based system with a Gaussian mixture. Decoding is carried out in multiple forward acoustic passes, where more refined acoustic and language models are used in successive passes and information is transmitted via word graphs. In order to deal with the varied acoustic conditions, channel compensation is performed iteratively, refining the noise estimates before the first three decoding passes. The final decoding pass is carried out with speaker-adapted models obtained via unsupervised adaptation using the MLLR method. On the Sennheiser microphone (average SNR 29 dB) a word error of 9.1% was obtained, which can be compared to 17.5% on the secondary microphone data (average SNR 15 dB) using the same recognition system.

[1]  Lori Lamel,et al.  Speaker-independent continuous speech dictation , 1993, Speech Communication.

[2]  Mark J. F. Gales,et al.  An improved approach to the hidden Markov model decomposition of speech and noise , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Jonathan G. Fiscus,et al.  1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.

[4]  Kiyohiro Shikano,et al.  Recognition of noisy speech by composition of hidden Markov models , 1993, EUROSPEECH.

[5]  Jean-Luc Gauvain,et al.  Developments in continuous speech dictation using the ARPA WSJ task , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Chin-Hui Lee,et al.  Bayesian learning for hidden Markov model with Gaussian mixture state observation densities , 1991, Speech Commun..

[7]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[8]  Jean-Luc Gauvain,et al.  A phone-based approach to non-linguistic speech feature identification , 1995, Comput. Speech Lang..

[9]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[10]  J. L. Gauvain Developments in Large Vocabulary Dictation : The LIMSI Nov94 NAB System , 1995 .

[11]  Dirk Van Compernolle Noise adaptation in a hidden Markov model speech recognition system , 1989 .

[12]  Mark J. F. Gales,et al.  Robust speech recognition in additive and convolutional noise using parallel model combination , 1995, Comput. Speech Lang..