论文信息 - Lexicon optimization for dutch speech recognition in spoken document retrieval

Lexicon optimization for dutch speech recognition in spoken document retrieval

In this paper, ongoing work concerning the language modelling and lexicon optimization of a Dutch speech recognition system for Spoken Document Retrieval is described: the collection and normalization of a training data set and the optimization of our recognition lexicon. Effects on lexical coverage of the amount of training data, of decompounding compound words and of different selection methods for proper names and acronyms are discussed.

Franciska de Jong | Roeland Ordelman | Arjan van Hessen

[1] Ronald Rosenfeld,et al. Optimizing lexical and N-gram coverage via judicious use of linguistic data , 1995, EUROSPEECH.

[2] Lori Lamel,et al. The Use of Lexica in Automatic Speech Recognition , 2000 .

[3] Lori Lamel,et al. Investigating text normalization and pronunciation variants for German broadcast transcription , 2000, INTERSPEECH.

[4] Jean-Luc Gauvain,et al. Language modeling for broadcast news transcription , 1999, EUROSPEECH.

[5] Lori Lamel,et al. Developments in large vocabulary, continuous speech recognition of German , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.