A unified text extraction method for instructional videos

Videotext can be an efficient semantic index and summary for instructional videos. However, videotext usually appears in different visual formats: handwritten slides, electronic slides, book pages, web pages, handwriting on chalkboard, etc. We propose a unified approach to handle all these kinds of videotext in three steps. First, we detect still video segments by analyzing motion energy patterns in instructional videos, and construct a quality-enhanced candidate text frame for each still video segment. Then, we use a trained SVM classifier to verify the candidate text frames, as well as to segment the text region and individual text blocks from the verified frames. Finally, we filter redundant text frames with similar text content by a Hausdorff distance-based image comparison algorithm. The resulting text frames are automatically organized into HTML and PDF documents to serve as an imagery summarization of the instructional videos. We show the application of our method to 75 instructional videos of five different courses, and discuss its applications.

[1]  C. Dorai,et al.  Accurate Overlay Text Extraction for Digital Video Analysis , 2003 .

[2]  John R. Kender,et al.  Rule-based semantic summarization of instructional videos , 2002, Proceedings. International Conference on Image Processing.

[3]  EffelsbergWolfgang,et al.  Automatic text segmentation and text recognition for video indexing , 2000 .

[4]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Kunio Fukunaga,et al.  Blackboard segmentation using video image of lecture and its applications , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[7]  Jean-Philippe Thiran,et al.  Text identification in complex background using SVM , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[8]  Wolfgang Effelsberg,et al.  Automatic text segmentation and text recognition for video indexing , 2000, Multimedia Systems.