VAST MM: multimedia browser for presentation video

In the domain of candidly captured student presentation videos, we examine and evaluate approaches for multi-modal analysis and indexing of audio and video. We apply visual segmentation techniques on unedited video to determine likely changes of topics. Speaker segmentation methods are employed to determine individual student appearances, which are linked to extracted headshots to create a visual speaker index. Videos are augmented with time-aligned filtered keywords and phrases from highly inaccurate speech transcripts. Our experimental user interface, the VAST MM Browser (Video Audio Structure Text Multi Media Browser), combines streaming videos, visual, and textual indices for browsing and searching. We evaluate the UI and methods in a large engineering design course. We report on observations and statistics collected over 4 semesters and 598 student participants. Results suggest that our video indexing and retrieval approach is effective, and that our continuous improvements are reflecting in an increase in precision and recall of user study tasks.

[1]  Brian Christopher Smith,et al.  Passive capture and structuring of lectures , 1999, MULTIMEDIA '99.

[2]  John R. Kender,et al.  Augmented segmentation and visualization for presentation videos , 2005, MULTIMEDIA '05.

[3]  Alexander Haubold Selection and ranking of text from highly imperfect transcripts for retrieval of video content , 2007, SIGIR.

[4]  Shih-Fu Chang,et al.  A highly efficient system for automatic face region detection in MPEG video , 1997, IEEE Trans. Circuits Syst. Video Technol..

[5]  Jay F. Nunamaker,et al.  Segmentation of lecture videos based on text: a method combining multiple linguistic features , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[6]  Gregory D. Abowd,et al.  Teaching and learning as multimedia authoring: the classroom 2000 project , 1997, MULTIMEDIA '96.

[7]  Milind R. Naphade,et al.  Semantic Multimedia Retrieval using Lexical Query Expansion and Model-Based Reranking , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[8]  Anoop Gupta,et al.  Auto-summarization of audio-video presentations , 1999, MULTIMEDIA '99.

[9]  John R. Kender,et al.  Analysis and visualization of index words from audio transcripts of instructional videos , 2004, IEEE Sixth International Symposium on Multimedia Software Engineering.

[10]  John R. Kender,et al.  Alignment of Speech to Highly Imperfect Text Transcriptions , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[11]  John R. Kender,et al.  Analysis and interface for instructional video , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[12]  Larry S. Davis,et al.  Look who's talking: speaker detection using video and audio correlation , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[13]  Aaron F. Bobick,et al.  Video surveillance of interactions , 1999, Proceedings Second IEEE Workshop on Visual Surveillance (VS'99) (Cat. No.98-89223).