Many important speech recognition tasks feature an open, constantly changing vocabulary. (E.g. broadcast news transcription, spoken document retrieval, . . . ) Recognition of (new) words requires acoustic baseforms for them to be known. Commonly words are transcribed manually, which poses a major burden on vocabulary adaptation and interdomain portability. In this work we investigate the possibility of applying a data-driven grapheme-tophoneme converter to obtain the necessary phonetic transcriptions. Experiments were carried out on English and German speech recognition tasks. We study the relation between transcription quality and word error rate and show that manual transcription effort can be reduced significantly by this method with acceptable loss in performance.
[1]
Hermann Ney,et al.
Fast Search for Large Vocabulary Speech Recognition
,
2000
.
[2]
Hermann Ney,et al.
From within-word model search to across-word model search in large vocabulary continuous speech recognition
,
2002,
Comput. Speech Lang..
[3]
Wolfgang Wahlster,et al.
Verbmobil: Foundations of Speech-to-Speech Translation
,
2000,
Artificial Intelligence.
[4]
Frédéric Bimbot,et al.
Variable-length sequence matching for phonetic transcription using joint multigrams
,
1995,
EUROSPEECH.
[5]
Hermann Ney,et al.
Investigations on joint-multigram models for grapheme-to-phoneme conversion
,
2002,
INTERSPEECH.