Multigram-based grapheme-to-phoneme conversion for LVCSR

Many important speech recognition tasks feature an open, constantly changing vocabulary. (E.g. broadcast news transcription, spoken document retrieval, . . . ) Recognition of (new) words requires acoustic baseforms for them to be known. Commonly words are transcribed manually, which poses a major burden on vocabulary adaptation and interdomain portability. In this work we investigate the possibility of applying a data-driven grapheme-tophoneme converter to obtain the necessary phonetic transcriptions. Experiments were carried out on English and German speech recognition tasks. We study the relation between transcription quality and word error rate and show that manual transcription effort can be reduced significantly by this method with acceptable loss in performance.