论文信息 - Statistical language models for large vocabulary spontaneous speech recognition in dutch

Statistical language models for large vocabulary spontaneous speech recognition in dutch

In state-of-the-art large vocabulary automatic recognition systems, a large statistical language model is used, typically an N-gram. However in order to estimate this model, a large database of sentences or texts in the same style as the recognition task is needed. For spontaneous speech one doesn’t dispose of such database since it should consist of accurate thus expensive orthographic transcriptions of spoken audio. This paper investigates how readily available large news paper corpora can be used to improve languagemodels for spontaneous speech recognition although both language styles differ considerably. A technique is proposed that does a perplexity based automatic selection of appropriate news paper articles and that subsequently uses these texts in the language model estimation. Recognition experiments on spontaneous broadcast speech in Dutch showed significant improvements using this technique.

Hugo Van hamme | Patrick Wambacq | Jacques Duchateau | Dong Hoon Van Uytsel

[1] Andreas Wendemuth,et al. The philips/RWTH system for transcription of broadcast news , 1999, EUROSPEECH.

[2] Jean-Luc Gauvain,et al. Language modeling for broadcast news transcription , 1999, EUROSPEECH.

[3] Jean-Luc Gauvain,et al. Recent advances in transcribing television and radio broadcasts , 1999, EUROSPEECH.

[4] Alex Waibel,et al. New developments in automatic meeting transcription , 2000, INTERSPEECH.

[5] D. Biber. A typology of English texts , 1989 .

[6] Patrick Wambacq,et al. An efficient search space representation for large vocabulary continuous speech recognition , 2000, Speech Commun..

[7] Hugo Van hamme,et al. Evaluation of model-based feature enhancement on the AURORA-4 task , 2003, INTERSPEECH.

[8] Stephen E. Robertson,et al. Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[9] Patrick Wambacq,et al. Adding robustness to language models for spontaneous speech recognition , 2004 .

[10] Dirk Van Compernolle,et al. Fast and accurate acoustic modelling with semi-continuous HMMs , 1998, Speech Commun..