Educational video understanding: mapping handwritten text to textbook chapters

Handwritten text frames appear frequently in educational videos and can be used as an important cue for semantic analysis of educational videos. We detect text frames using a motion pattern analyzing algorithm. Then, we extract binary handwritten word images from the text frames in various visual formats: handwritten slides, electronic slides, handwriting on chalkboard, etc. We propose a handwritten word recognition method, using combined dynamic programming stroke-based character segmentation with optimal statistical handwritten character recognition. In parallel, we construct a small vocabulary from topic words taken from table-of-contents of course materials such as the course textbook. We use the handwritten word recognition results to query this table-of-contents structure, implemented as latent semantic analysis matrix operations. We are able to spot the most likely discussed chapters and topic words for each frame. We evaluate the overall approach on 12 videos of two courses, and the results are encouraging.

[1]  Denis Lalanne,et al.  Talking about documents: revealing a missing link to multimedia meeting archives , 2003, IS&T/SPIE Electronic Imaging.

[2]  Michael D. Garris,et al.  Neural network-based systems for handprint OCR applications , 1998, IEEE Trans. Image Process..

[3]  Patrick J. Grother,et al.  Karhunen Loève feature extraction for neural handwritten character recognition , 1992, Defense, Security, and Sensing.

[4]  Sargur N. Srihari,et al.  Gradient-based contour encoding for character recognition , 1996, Pattern Recognit..

[5]  Kunio Fukunaga,et al.  Blackboard segmentation using video image of lecture and its applications , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[6]  Hong Yan,et al.  Skew Correction of Document Images Using Interline Cross-Correlation , 1993, CVGIP Graph. Model. Image Process..

[7]  Gregory D. Abowd,et al.  Automated capture, integration, and visualization of multiple media streams , 1998, Proceedings. IEEE International Conference on Multimedia Computing and Systems (Cat. No.98TB100241).

[8]  John R. Kender,et al.  A method and user interface for instructional video indexing via recognition of handwritten table-of-contents words , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[9]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  John R. Kender,et al.  Rule-based semantic summarization of instructional videos , 2002, Proceedings. International Conference on Image Processing.

[11]  M. Garris NIST form-based handprint recognition system , 1994 .

[12]  Patrick J. Grother,et al.  NIST Form-Based Handprint Recognition System , 1994 .

[13]  Alireza Khotanzad,et al.  Invariant Image Recognition by Zernike Moments , 1990, IEEE Trans. Pattern Anal. Mach. Intell..