This paper shows that a domain-dependent language model and state-skipped HMMs can achieve improvements in word recognition accuracy on a broadcast sports news transcription task. Although a domain-dependent language model is much better than a general model in terms of word error rate, the smaller training corpus for a special topic relative to the general news corpus leads to problems especially in higher-order n-gram probability estimation. In this paper, we tried a linear interpolation technique to smooth out unreliable higher-order n-gram probabilities using more reliable lower-order n-gram probabilities. We also applied a language model adaptation technique by using news manuscripts on sports topics. For acoustic modeling, since the speech rate of sports news speech was faster than that of general news speech, we added two state-skipping paths to three-state HMMs to deal with phonemes of duration less than three frames. Overall, we reduced the word error rate from 15.1% to 5.8%, and achieved sufficient performance to realize real-time subtitling services.
[1]
Akio Ando,et al.
An Examination of Speech Recognition for News Commentary
,
2000
.
[2]
John Makhoul,et al.
Further advances in transcription of broadcast news
,
1999,
EUROSPEECH.
[3]
H. Isono,et al.
Real-time transcription system for simultaneous subtitling of Japanese broadcast news programs
,
2000
.
[4]
Kazuo Onoe,et al.
Time dependent language model for broadcast news transcription and its post-correction
,
1998,
ICSLP.
[5]
Ronald Rosenfeld,et al.
Statistical language modeling using the CMU-cambridge toolkit
,
1997,
EUROSPEECH.
[6]
Frederick Jelinek,et al.
Interpolated estimation of Markov source parameters from sparse data
,
1980
.