Investigating text normalization and pronunciation variants for German broadcast transcription

In this paper we describe our ongoing work concerning lexical modeling in the LIMSI broadcast transcription system for German. Lexical decomposition is investigated with a twofold goal: lexical coverage optimization and improved letter-t o-sound conversion. A set of about 450 decompounding rules, developed using statistics from a 300M word corpus, reduces the OOV rate from 4.5% to 4.0% on a 30k development text set. Adding partial inflection stripping, the OOV rate drops to 2.9%. For let terto-sound conversion, decompounding reduces cross-lexeme ambiguities and thus contributes to more consistent pronunci ation dictionaries. Another point of interest concerns reduced p ronunciation modeling. Word error rates, measured on 1.3 hours of ARTE TV broadcast, vary between 18 and 24% depending on the show and the system configuration. Our experiments indicate that using reduced pronunciations slightly decreases word erro r rates.

[1]  Petra Geutner,et al.  Using morphology towards better large-vocabulary speech recognition systems , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Lori Lamel,et al.  Developments in large vocabulary, continuous speech recognition of German , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  Lori Lamel,et al.  Text normalization and speech recognition in French , 1997, EUROSPEECH.

[4]  Alexander H. Waibel,et al.  Selection criteria for hypothesis driven lexical adaptation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  Lori Lamel,et al.  The LIMSI 1998 Hub-4E Transcription System , 1997 .