An Automatic Video Text Detection, Localization and Extraction Approach

Text in video is a very compact and accurate clue for video indexing and summarization. This paper presents an algorithm regarding word group as a special symbol to detect, localize and extract video text using support vector machine (SVM) automatically. First, four sobel operators are applied to get the EM(edge map) of the video frame and the EM is segmented into N×2N size blocks. Then character features and characters group structure features are extracted to construct a 19-dimension feature vector. We use a pre-trained SVM to partition each block into two classes: text and non-text blocks. Secondly a dilatation-shrink process is employed to adjust the text position. Finally text regions are enhanced by multiple frame information. After binarization of enhanced text region, the text region with clean background is recognized by OCR software. Experimental results show that the proposed method can detect, localize, and extract video texts with high accuracy.

[1]  David S. Doermann,et al.  Text Extraction, Enhancement and OCR in Digital Video , 1998, Document Analysis Systems.

[2]  Michael R. Lyu,et al.  A robust statistic method for classifying color polarity of video text , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[3]  Xian-Sheng Hua,et al.  Automatic performance evaluation for video text detection , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[4]  Nobuyuki Otsu,et al.  ATlreshold Selection Method fromGray-Level Histograms , 1979 .

[5]  Michael R. Lyu,et al.  A comprehensive method for multilingual video text detection, localization, and extraction , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Ke Wang,et al.  Localization site prediction for membrane proteins by integrating rule and SVM classification , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Xinbo Gao,et al.  A spatial-temporal approach for video caption detection and recognition , 2002, IEEE Trans. Neural Networks.

[8]  Hongjiang Zhang,et al.  Content-Based Video Analysis, Retrieval and Browsing , 2003 .

[9]  Jean-Philippe Thiran,et al.  Text identification in complex background using SVM , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  Clement T. Yu,et al.  Techniques and Systems for Image and Video Retrieval , 1999, IEEE Trans. Knowl. Data Eng..

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Takeo Kanade,et al.  Video OCR: indexing digital news libraries by recognition of superimposed captions , 1999, Multimedia Systems.