This paper describes speech summarization using F0 information. The speech summarization in this work is realized by the extraction of important sentences from text data transcribed by hand. The important problem in this framework is automatic scoring of sentence importance based on prosodic information from speech wave as well as linguistic information from written text. Prosody conveys nonlinguistic information such as speaker’s intention and contributes to identify important parts of speech. The prosodic information is represented in terms of F0 parameters of Japanese bunsetsu unit, which is almost equivalent to a prosodic minor phrase. Six kinds of F0 parameters are compared in regard to correlation to the sentence importance and performance of extracting important sentences. Evaluation results show that introduction of F0 parameters is effective to the speech summarization.
[1]
H. Sato,et al.
Two-stage F/sub 0/ control model using syllable based F/sub 0/ units
,
1992,
[Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[2]
Julia Hirschberg,et al.
Acoustic indicators of topic segmentation
,
1998,
ICSLP.
[3]
Chikio Hayashi.
On the quantification of qualitative data from the mathematico-statistical point of view
,
1950
.
[4]
Sadaoki Furui,et al.
Advances in automatic speech summarization
,
2001,
INTERSPEECH.
[5]
Yoichi Yamashita,et al.
Stochastic F0 contour model based on the clustering of F0 shapes of a syntactic unit
,
2001,
INTERSPEECH.