Content Extraction and Summarization of Instructional Videos

This paper presents a robust approach to extracting and summarizing the textual content of instructional videos for handwritten recognition, indexing and retrieval, and other e-learning applications. Content extraction from instructional videos is challenging due to image noise, light condition changes, camera movements, and unavoidable occlusions by instructors. In this paper, we develop a probabilistic model to accurately detect board regions and an adaptive thresholding technique to extract the written chalk pixels on blackboards. We farther compute instructional video key frames by analyzing the fluctuation of the number of chalk pixels. By matching the textual content of video frames using a Hausdorff-distance-based technique, we reduce the content redundancy among the key frames. Performance evaluation on three full-length instructional videos shows that our algorithm is highly effective in summarizing instructional video content and achieves very low content missing rates.

[1]  John R. Kender,et al.  Rule-based semantic summarization of instructional videos , 2002, Proceedings. International Conference on Image Processing.

[2]  Chong-Wah Ngo,et al.  Video text detection and segmentation for optical character recognition , 2005, Multimedia Systems.

[3]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[4]  Kunio Fukunaga,et al.  Production of video images by computer controlled camera operation based on distribution of spatiotemporal mutual information , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[5]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Tanveer F. Syeda-Mahmood,et al.  Detecting topical events in digital video , 2000, ACM Multimedia.

[7]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[8]  Sang Uk Lee,et al.  Efficient video indexing scheme for content-based retrieval , 1999, IEEE Trans. Circuits Syst. Video Technol..

[9]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Chitra Dorai,et al.  Structuralizing educational videos based on presentation content , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[11]  Michael J. Black,et al.  Summarization of videotaped presentations: automatic analysis of motion and gesture , 1998, IEEE Trans. Circuits Syst. Video Technol..