Presents a set of techniques that we employed in our Janus Recognition Toolkit (JRTk) Switchboard and CallHome recognizer in order to deal with imperfections in the transcriptions: inconsistent transcription of pronunciations and contractions, as well as errors in utterance segmentations. These techniques consist of a dynamic, speaking-mode-dependent pronunciation model and a flexible utterance alignment procedure which is based on speaker-adapted models (label boosting). The idea is (a) to automatically retranscribe the training corpus based on these models and procedures, (b) to train a recognizer based on these flexible transcription graphs, and (c) to decode with a dynamic speaking-mode-dependent dictionary. The framework is successfully applied to increase the performance of our state-of-the-art JRTk Switchboard recognizer significantly.
[1]
Alexander H. Waibel,et al.
Recognition of conversational telephone speech using the JANUS speech engine
,
1997,
1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[2]
Alex Waibel,et al.
Modeling Systematic Variations in Pronunciation via a Language-Dependent Hidden Speaking Mode
,
1999
.
[3]
Alexander H. Waibel,et al.
Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition
,
1997,
EUROSPEECH.
[4]
Daniel Jurafsky,et al.
Building multiple pronunciation models for novel words using exploratory computational phonology
,
1995,
EUROSPEECH.