Transcribing broadcast news shows

While significant improvements have been made in large vocabulary continuous speech recognition of large read-speech corpora such as the ARPA Wall Street Journal-based CSR corpus (WSJ) for American English and the BREF corpus for French, these tasks remain relatively artificial. In this paper we report on our development work in moving from laboratory read speech data to real-world speech data in order to build a system for the new ARPA broadcast news transcription task. The LIMSI Nov96 speech recognizer makes use of continuous density HMMs with Gaussian mixtures for acoustic modeling and n-gram statistics estimated on newspaper texts. The acoustic models are trained on the WSJO/WSJ1, and adapted using MAP estimation with task-specific training data. The overall word error on the Nov96 partitioned evaluation test was 27.1%.

[1]  Jean-Luc Gauvain,et al.  Developments in continuous speech dictation using the 1995 ARPA NAB news task , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2]  Jean-Luc Gauvain,et al.  Developments in continuous speech dictation using the ARPA WSJ task , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[4]  Lori Lamel,et al.  Speaker-independent continuous speech dictation , 1993, Speech Communication.

[5]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..