JAPANESE BROADCAST NEWS TRANSCRIPTION AND TOPIC DETECTION

This paper reports recent advances in Japanese broadcast news transcription and automatic topic detection from the transcribed news speech. To cope with the variability of the readings for each word, a new method for incorporating reading probability of each word in the decoding process is proposed. As a realistic solution to the new-word problem, a new method is proposed, in which new words are manually registered and OOV language model is applied to the new word. To detect topic words for news speech, two methods are proposed; one uses a relevance measure between each word in the news and each word in the topic word set, and the other uses a significance measure for each word based on the frequency ratio.

[1]  Sadaoki Furui,et al.  Topic extraction based on continuous speech recognition in broadcast-news speech , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[2]  Katsuhiko Shirai,et al.  Toward automatic transcription of Japanese broadcast news , 1997, EUROSPEECH.

[3]  Sadaoki Furui,et al.  Topic extraction with multiple topic-words in broadcast-news speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).