Bag of subjects: lecture videos multimodal indexing

In this paper, we address multimodal indexing and retrieval for videos of lectures or seminars. This paper proposes a combination of technologies respectively issuing from image document analysis and text mining. Based on visual information and textual information extracted from slide images, we investigate a Bag of mixed Words (visual words and textual words) model to represent lecture slide's contents. Lecture videos are indexed and retrieved by using extended Bag of Words model. In this model, it is assumed that a video may contain multiple subjects; and this model discovers the visual representation of these subjects automatically and indexes the video accordingly. We discuss the mixed text/image query and proposed indexing approach for retrieval lecture videos and report a quantitative evaluation on lecture videos of our Lab.