SUMMARIZATION OF SPOKEN LECTURES BASED ON LINGUISTIC SURFACE AND PROSODIC INFORMATION

We aim to extract automatically the summarization of spoken lectures for conferences and classes. For this purpose, at first we compared results of summarization extracted by human subjects. We found large differences with every subject. Then we investigated the relations between linguistic surface information and human results, and we obtained useful linguistic surface information. Next, we summarized spoken lectures on conferences and classes using the linguistic information. Additionally, we also focused on prosodic features; F0 and power. We conducted the same experiments on them. Lastly, we combined linguistic surface information and prosodic information. As a result, the proposed automatic summarization produced a better F- measure (0.599), k-value (0.420) and Rouge metric (0.758) comparable with human results.

[1]  Sadaoki Furui,et al.  Automatic speech summarization based on sentence extraction and compaction , 2002, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Nobuaki Minematsu,et al.  Continuous speech recognition using segmental unit input HMMs with a mixture of probability density functions and context dependency , 1998, ICSLP.

[3]  Klaus Zechner Spoken language condensation in the 21st century , 2003, INTERSPEECH.

[4]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[5]  Hagen Soltau,et al.  Advances in automatic meeting record creation and access , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  Takehito Utsuro,et al.  Unsupervised speaker adaptation using high confidence portion recognition results by multiple recognition systems , 2004, INTERSPEECH.

[7]  Nobuaki Minematsu,et al.  Sharable software repository for Japanese large vocabulary continuous speech recognition , 1998, ICSLP.

[8]  Sadaoki Furui,et al.  Automatic speech summarization based on word significance and linguistic likelihood , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[9]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[10]  Gerard Salton,et al.  On the Specification of Term Values in Automatic Indexing , 1973 .

[11]  James R. Glass,et al.  Analysis and Processing of Lecture Audio Data: Preliminary Investigations , 2004, Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004 - SpeechIR '04.

[12]  Norbert Reithinger,et al.  Summarizing Multilingual Spoken Negotiation Dialogues , 2000, ACL.

[13]  Konstantinos Koumpis,et al.  Extractive summarization of voicemail using lexical and prosodic feature subset selection , 2001, INTERSPEECH.

[14]  S. Furui,et al.  A JAPANESE NATIONAL PROJECT ON SPONTANEOUS SPEECH CORPUS AND PROCESSING TECHNOLOGY , 2003 .