Key frame extraction for text based video retrieval using Maximally Stable Extremal Regions

This paper presents a new approach for text-based video content retrieval system. The proposed scheme consists of three main processes that are key frame extraction, text localization and keyword matching. For the key-frame extraction, we proposed a Maximally Stable Extremal Region (MSER) based feature which is oriented to segment shots of the video with different text contents. In text localization process, in order to form the text lines, the MSERs in each key frame are clustered based on their similarity in position, size, color, and stroke width. Then, Tesseract OCR engine is used for recognizing the text regions. In this work, to improve the recognition results, we input four images obtained from different pre-processing methods to Tesseract engine. Finally, the target keyword for querying is matched with OCR results based on an approximate string search scheme. The experiment shows that, by using the MSER feature, the videos can be segmented by using efficient number of shots and provide the better precision and recall in comparison with a sum of absolute difference and edge based method.

[1]  Wu-Chih Hu,et al.  License Plate Recognition for Moving Vehicles Using a Moving Camera , 2013, 2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[2]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[3]  Li Yujian,et al.  A Normalized Levenshtein Distance Metric , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[5]  Yoshinao Aoki,et al.  Indexing of baseball telecast for content-based video retrieval , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[6]  Raimondo Schettini,et al.  Erratum to: An innovative algorithm for key frame extraction in video summarization , 2006, Journal of Real-Time Image Processing.

[7]  Kaizhu Huang,et al.  Robust Text Detection in Natural Scene Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Kazimierz Choros,et al.  Automatic detection of headlines in temporally aggregated TV sports news videos , 2013, 2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA).

[9]  Tudor Barbu,et al.  Content-Based Image Retrieval Using Gabor Filtering , 2009, 2009 20th International Workshop on Database and Expert Systems Application.

[10]  Steven C. H. Hoi,et al.  Chinese University of Hong Kong at TRECVID 2006: Shot Boundary Detection and Video Search , 2006, TRECVID.

[11]  Christoph Meinel,et al.  Content Based Lecture Video Retrieval Using Speech and Video Text Information , 2014, IEEE Transactions on Learning Technologies.

[12]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Tobun Dorbin Ng,et al.  Informedia at TRECVID 2003 : Analyzing and Searching Broadcast News Video , 2003, TRECVID.

[14]  Yan Yang,et al.  Content-Based Video Retrieval (CBVR) System for CCTV Surveillance Videos , 2009, 2009 Digital Image Computing: Techniques and Applications.

[15]  Daniel P. Lopresti,et al.  Validation of Image Defect Models for Optical Character Recognition , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Li Li,et al.  A Survey on Visual Content-Based Video Indexing and Retrieval , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[17]  Erdal Oruklu,et al.  A Design Flow for Robust License Plate Localization and Recognition in Complex Scenes , 2012 .

[18]  Keiichiro Hoashi,et al.  SVM-Based Shot Boundary Detection with a Novel Feature , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[19]  Kin-Man Lam,et al.  A new key frame representation for video segment retrieval , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  George Economou,et al.  Key frame extraction in video sequences: a vantage points approach , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[21]  Fatos T. Yarman-Vural,et al.  Optical Character Recognition for Cursive Handwriting , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Mehak Naz Mangoli,et al.  Optical Character Recognition for Cursive Handwriting , 2016 .

[23]  Alberto Del Bimbo,et al.  Content-based indexing and retrieval of TV news , 2001, Pattern Recognit. Lett..

[24]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[25]  Dong-Sik Jang,et al.  Gradual shot boundary detection using localized edge blocks , 2006, Multimedia Tools and Applications.

[26]  Ioannis Pitas,et al.  Information theory-based shot cut/fade detection and video summarization , 2006, IEEE Transactions on Circuits and Systems for Video Technology.