A Robust Approach for Overlay Text Localization and Extraction in Complex Video Scene

Overlay text in video carries important semantic clues for video information retrieval and summarization. In this paper, we propose a robust method that is able to accurately locate text lines and extract text even in complex video scene. In the text localization stage, this paper adopts the method based on corner point. First, corner detection is used to extract corners as text features from video frames. Then multi-layer filtering mechanism (MLFM) is used to locate the text lines, which consists of corners clustering, corners horizontal projection, background filtering and heuristic rules. This MLFM can effectively remove the isolated corners, locate the text lines accurately and remove the background or pseudo text lines automatically. In the text extraction stage, this paper proposed a twice binarization method that combines with polarity judgment on image. The polarity judgment was used as a guide to adjust the first binarization threshold when we perform the first binarization. After the first binarization, a main proportion of the image has been processed, and the rest will be processed by the second binarization. Experimental results show that this approach can fast and robustly locate text lines and extract text in video even under complex background.

[1]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[2]  Ellen K. Hughes,et al.  Video OCR for digital news archive , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[3]  Michael R. Lyu,et al.  A comprehensive method for multilingual video text detection, localization, and extraction , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Asif Masood,et al.  A Caption Text Detection Method from Images/Videos for Efficient Indexing and Retrieval of Multimedia Data , 2015, Int. J. Pattern Recognit. Artif. Intell..

[5]  Yuxiao Hu,et al.  Text From Corners: A Novel Approach to Detect Text and Caption in Videos , 2011, IEEE Transactions on Image Processing.

[6]  Weiqiang Wang,et al.  Video Text Extraction Using the Fusion of Color Gradient and Log-Gabor Filter , 2014, 2014 22nd International Conference on Pattern Recognition.

[7]  B. H. Shekar,et al.  Skeleton Matching based approach for Text Localization in Scene Images , 2015, ArXiv.

[8]  B. H. Shekar,et al.  Phase congruency and morphology based approach for text localization in videos , 2015, 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[9]  Tao Zhang,et al.  Automatic Video Text Localization and Recognition , 2007, Fourth International Conference on Image and Graphics (ICIG 2007).

[10]  Xujun Peng,et al.  Text detection and recognition in natural scenes and consumer videos , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Ioannis Pratikakis,et al.  A two-stage scheme for text detection in video images , 2010, Image Vis. Comput..

[12]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[13]  Palaiahnakote Shivakumara,et al.  Text detection in natural scenes using Gradient Vector Flow-Guided symmetry , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[14]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[15]  Xu-Cheng Yin,et al.  Text Detection, Tracking and Recognition in Video: A Comprehensive Survey , 2016, IEEE Transactions on Image Processing.

[16]  Palaiahnakote Shivakumara,et al.  Wavelet-gradient-fusion for video text binarization , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).