In this paper, we address multimodal indexing and retrieval for videos of lectures or seminars. This paper proposes a combination of technologies respectively issuing from image document analysis and text mining. Based on visual information and textual information extracted from slide images, we investigate a Bag of mixed Words (visual words and textual words) model to represent lecture slide's contents. Lecture videos are indexed and retrieved by using extended Bag of Words model. In this model, it is assumed that a video may contain multiple subjects; and this model discovers the visual representation of these subjects automatically and indexes the video accordingly. We discuss the mixed text/image query and proposed indexing approach for retrieval lecture videos and report a quantitative evaluation on lecture videos of our Lab.
[1]
Marti A. Hearst.
Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages
,
1997,
CL.
[2]
Freddy Y. Y. Choi.
Advances in domain independent linear text segmentation
,
2000,
ANLP.
[3]
Karen Spärck Jones.
Experiments in relevance weighting of search terms
,
1979,
Inf. Process. Manag..
[4]
John R. Kender,et al.
Semantic keyword extraction via adaptive text binarization of unstructured unsourced video
,
2009,
2009 16th IEEE International Conference on Image Processing (ICIP).
[5]
John Adcock,et al.
TalkMiner: a lecture webcast search engine
,
2010,
ACM Multimedia.