German broadcast news transcription

We describe a newly created broadcast news (BN) corpus based on programs of seven different German and Austrian TV stations and the development of a German BN transcription system based on this corpus. We report on a series of experiments addressing the fact that German is less suited than English for word-based trigram language models. Furthermore, we investigate various phoneme sets and examine the difference between a transregional standard (Bavarian dialect spoken in southern Germany and Austria) and standard German (Hochdeutsch) on the word error rate.