论文信息 - Automated Whiteboard Lecture Video Summarization by Content Region Detection and Representation

Automated Whiteboard Lecture Video Summarization by Content Region Detection and Representation

Lecture videos are rapidly becoming an invaluable source of information for students across the globe. Given the large number of online courses currently available, it is important to condense the information within these videos into a compact yet representative summary that can be used for search-based applications. We propose a framework to summarize whiteboard lecture videos by finding feature representations of detected handwritten content regions to determine unique content. We investigate multi-scale histogram of gradients and embeddings from deep metric learning for feature representation. We explicitly handle occluded, growing and disappearing handwritten content. Our method is capable of producing two kinds of lecture video summaries - the unique regions themselves or so-called key content and keyframes (which contain all unique content in a video segment). We use weighted spatio-temporal conflict minimization to segment the lecture and produce keyframes from detected regions and features. We evaluate both types of summaries and find that we obtain state-of-the-art peformance in terms of number of summary keyframes while our unique content recall and precision are comparable to state-of-the-art.

[1] C. V. Jawahar,et al. Localizing and Recognizing Text in Lecture Videos , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[2] Gernot A. Fink,et al. A Method for Camera-Based Interactive Whiteboard Reading , 2011, CBDAR.

[3] Kenny Davila Castellanos. Symbolic and Visual Retrieval of Mathematical Notation using Formula Graph Symbol Pair Matching and Structural Alignment , 2017 .

[4] Fei Yin,et al. Online Video Text Detection with Markov Decision Process , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[5] Gernot A. Fink,et al. PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[6] Matthew R. Scott,et al. Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Xiang Li,et al. Shape Robust Text Detection With Progressive Scale Expansion Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Tiecheng Liu,et al. Summarization of Visual Content in Instructional Videos , 2007, IEEE Transactions on Multimedia.

[9] Xu-Cheng Yin,et al. Text Detection, Tracking and Recognition in Video: A Comprehensive Survey , 2016, IEEE Transactions on Image Processing.

[10] Wafa Khlif,et al. ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[11] Krzysztof Z. Gajos,et al. Understanding in-video dropouts and interaction peaks inonline lecture videos , 2014, L@S.

[12] Venu Govindaraju,et al. Generalized framework for summarization of fixed-camera lecture videos by detecting and binarizing handwritten content , 2019, International Journal on Document Analysis and Recognition (IJDAR).

[13] Arijit Biswas,et al. MMToC: A Multimodal Method for Table of Content Creation in Educational Videos , 2015, ACM Multimedia.

[14] Venu Govindaraju,et al. Automated Detection of Handwritten Whiteboard Content in Lecture Videos for Summarization , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[15] Tiecheng Liu,et al. Extracting content from instructional videos by statistical modelling and classification , 2006, Pattern Analysis and Applications.

[16] Palaiahnakote Shivakumara,et al. Recognition of Video Text through Temporal Integration , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[17] Kunio Fukunaga,et al. Blackboard segmentation using video image of lecture and its applications , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[18] Fu-Hao Yeh,et al. Robust handwriting extraction and lecture video summarization , 2014, 2014 Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[19] Jean-Michel Jolion,et al. Object count/area graphs for the evaluation of object detection and segmentation algorithms , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[20] Lei Sun,et al. A CNN-Based Approach to Detecting Text from Images of Whiteboards and Handwritten Notes , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[21] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Xu-Cheng Yin,et al. Scene Text Detection in Video by Learning Locally and Globally , 2016, IJCAI.

[23] Ba Tu Truong,et al. Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[24] Qionghai Dai,et al. Structuring Lecture Videos by Automatic Projection Screen Localization and Analysis , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Venu Govindaraju,et al. Summarizing Lecture Videos by Key Handwritten Content Regions , 2019, 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW).

[26] Junsong Yuan,et al. From Keyframes to Key Objects: Video Summarization by Representative Object Proposal Selection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Ernest Valveny,et al. ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[28] Venu Govindaraju,et al. Content Extraction from Lecture Video via Speaker Action Classification Based on Pose Information , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[29] Kenny Davila,et al. Whiteboard Video Summarization via Spatio-Temporal Conflict Minimization , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[30] Ernest Valveny,et al. Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] Rishabh K. Iyer,et al. A Unified Multi-Faceted Video Summarization System , 2017, ArXiv.

[32] Philip J. Guo,et al. How video production affects student engagement: an empirical study of MOOC videos , 2014, L@S.