Class lecture summarization taking into account consecutiveness of important sentences

This paper presents a novel sentence extraction framework that takes into account the consecutiveness of important sentences using a Support Vector Machine (SVM). Generally, most extractive summarizers do not take context information into account, but do take into account the redundancy over the entire summarization. However, there must exist relationships among the extracted sentences. Actually, we can observe these relationships as consecutiveness among the sentences. We deal with this consecutiveness by using dynamic and difference features to decide if a sentence needs to be extracted or not. Since important sentences tend to be extracted consecutively, we just used the decision made for the previous sentence as the dynamic feature. We used the differences between the current and previous feature values for the difference feature, since adjacent sentences in a block of important ones should have similar feature values to each other, where as, there should be a larger difference in the feature values between an important sentence and an unimportant one. We also present a way to ensure that no redundant summarization occurs. Experimental results on a Corpus of Japanese classroom Lecture

[1]  Hsin-Min Wang,et al.  A unified probabilistic generative framework for extractive spoken document summarization , 2007, INTERSPEECH.

[2]  Gerald Penn,et al.  Summarization of spontaneous conversations , 2006, INTERSPEECH.

[3]  James R. Glass,et al.  Analysis and Processing of Lecture Audio Data: Preliminary Investigations , 2004, Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004 - SpeechIR '04.

[4]  Nobuaki Minematsu,et al.  Continuous speech recognition using segmental unit input HMMs with a mixture of probability density functions and context dependency , 1998, ICSLP.

[5]  Jian Zhang,et al.  A comparative study on speech summarization of broadcast news and lecture speech , 2007, INTERSPEECH.

[6]  Julia Hirschberg,et al.  Summarizing Speech Without Text Using Hidden Markov Models , 2006, NAACL.

[7]  Kengo Ohta,et al.  Developing Corpus of Japanese Classroom Lecture Speech Contents , 2008, LREC.

[8]  Seiichi Nakagawa,et al.  SUMMARIZATION OF SPOKEN LECTURES BASED ON LINGUISTIC SURFACE AND PROSODIC INFORMATION , 2006, 2006 IEEE Spoken Language Technology Workshop.

[9]  Ricardo Ribeiro,et al.  Extractive Summarization of Broadcast News: Comparing Strategies for European Portuguese , 2007, TSD.

[10]  Jean-Luc Gauvain,et al.  Transcribing lectures and seminars , 2005, INTERSPEECH.

[11]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[12]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[13]  Seiichi Nakagawa,et al.  Automatic extraction of cue phrases for important sentences in lecture speech and automatic lecture speech summarization , 2007, INTERSPEECH.

[14]  Yuji Matsumoto,et al.  Extracting Important Sentences with Support Vector Machines , 2002, COLING.

[15]  Jean Carletta,et al.  Extractive summarization of meeting recordings , 2005, INTERSPEECH.

[16]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[17]  Heidi Christensen,et al.  Exploring the style-technique interaction in extractive summarization of broadcast news , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[18]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.