Recognition of Video Text through Temporal Integration

This paper presents a method for temporal integration, which can be used to improve the recognition accuracy of video texts. Given a word detected in a video frame, we use a combination of Stroke Width Transform and SIFT (Scale Invariant Feature Transform) to track it both backward and forward in time. The text instances within the word's frame span are then extracted and aligned at pixel level. In the second step, we integrate these instances into a text probability map. By thresholding this map, we obtain an initial binarization of the word. In the final step, the shapes of the characters are refined using the intensity values. This helps to preserve the distinctive character features (e.g., sharp edges and holes), which are useful for OCR engines to distinguish between the different character classes. Experiments on English and German videos show that the proposed method outperforms existing ones in terms of recognition accuracy.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Baihua Xiao,et al.  A robust system for text extraction in video , 2007, 2007 International Conference on Machine Vision.

[3]  David S. Doermann,et al.  Text enhancement in digital video using multiple frame integration , 1999, MULTIMEDIA '99.

[4]  Weiqiang Wang,et al.  Extracting captions from videos using temporal feature , 2010, ACM Multimedia.

[5]  Wolfgang Effelsberg,et al.  The MoCA Project - Movie Content Analysis Research at the University of Mannheim , 1998, GI Jahrestagung.

[6]  Yuxin Peng,et al.  Using Multiple Frame Integration for the Text Recognition of Video , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[7]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[8]  Ioannis Pratikakis,et al.  Binarization of Textual Content in Video Frames , 2011, 2011 International Conference on Document Analysis and Recognition.

[9]  Bernd Freisleben,et al.  Tracking text in MPEG videos , 2004, MULTIMEDIA '04.

[11]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[12]  Xian-Sheng Hua,et al.  Efficient video text recognition using multiple frame integration , 2002, Proceedings. International Conference on Image Processing.

[13]  Jorge Stolfi,et al.  Snoopertrack: Text detection and tracking for outdoor videos , 2011, 2011 18th IEEE International Conference on Image Processing.

[14]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Chew Lim Tan,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence, Manuscript Id a Laplacian Approach to Multi-oriented Text Detection in Video , 2022 .

[16]  Palaiahnakote Shivakumara,et al.  A New Gradient Based Character Segmentation Method for Video Text Recognition , 2011, 2011 International Conference on Document Analysis and Recognition.