Automatic keyphrase extraction and segmentation of video lectures

Keyphrases are essential meta-data that summarize the contents of an instructional video. In this paper, we present a domain independent, statistical approach for automatic keyphrase extraction from audio transcripts of video lectures. We identify new features in audio transcripts, that capture key patterns characterizing keyphrases in lecture videos. A system for keyphrase extraction is designed that uses a supervised machine learning algorithm, based on a Naive-Bayes classifier to extract relevant keyphrases. Our extensive experimental studies show that our system extracts more relevant keywords than existing approaches. The paper also evaluates the performance of the proposed keyphrase extraction method for different categories of lectures. The extracted keyphrases are used further as features for automatic topic based segmentation of the video lectures. This process of automatic keyphrase extraction and segmentation results in a section-wise annotated video lecture which can be effectively viewed in a lecture browser.

[1]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[2]  Klaus Zechner,et al.  Automatic generation of concise summaries of spoken dialogues in unrestricted domains , 2001, SIGIR '01.

[3]  M. Halliday,et al.  Language, Context, and Text: Aspects of Language in a Social-Semiotic Perspective , 1989 .

[4]  James R. Glass,et al.  The MIT Spoken Lecture Processing Project , 2005, HLT.

[5]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[6]  Igor Malioutov,et al.  Minimum Cut Model for Spoken Lecture Segmentation , 2006, ACL.

[7]  Feifan Liu,et al.  Unsupervised Approaches for Automatic Keyword Extraction Using Meeting Transcripts , 2009, NAACL.

[8]  John R. Kender,et al.  Analysis and visualization of index words from audio transcripts of instructional videos , 2004, IEEE Sixth International Symposium on Multimedia Software Engineering.

[9]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[10]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[11]  Sophia Ananiadou,et al.  Extracting Nested Collocations , 1996, COLING.

[12]  Min-Yen Kan,et al.  Re-examining Automatic Keyphrase Extraction Approaches in Scientific Articles , 2009, MWE@IJCNLP.

[13]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[14]  Lonneke van der Plas,et al.  Automatic Keyword Extraction from Spoken Text. A Comparison of Two Lexical Resources: EDR and WordNet , 2004, LREC.

[15]  Yoshimi Suzuki,et al.  Keyword extraction of radio news using term weighting with an encyclopedia and newspaper articles , 1998, SIGIR '98.

[16]  Ian H. Witten,et al.  Thesaurus based automatic keyphrase indexing , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[17]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[18]  Iryna Gurevych,et al.  Semantic Similarity Applied to Spoken Dialogue Summarization , 2004, COLING.

[19]  Yasuo Ariki,et al.  Topic segmentation and retrieval system for lecture videos based on spontaneous speech recognition , 2003, INTERSPEECH.

[20]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.