Efficient video text recognition using multiple frame integration

Text superimposed on the video frames provides supplemental but important information for video indexing and retrieval. Many efforts have been made for videotext detection and recognition (video OCR). The main difficulties of video OCR are the low resolution and the background complexity. We present efficient schemes to deal with the second difficulty by sufficiently utilizing multiple frames that contain the same text to get every clear word from these frames. Firstly, we use multiple frame verification to reduce text detection false alarms. We then choose those frames where the text is most likely clear, thus it is more possible to be correctly recognized. We then detect and joint every clear text block from those frames to form a clearer "man-made" frame. Later we apply a block-based adaptive thresholding procedure on these "man-made" frames. Finally, the binarized frames are sent to an OCR engine for recognition. Experiments show that the word recognition rate has been increased over 28% by these methods.

[1]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[2]  Xian-Sheng Hua,et al.  Automatic location of text in video frames , 2001, MULTIMEDIA '01.

[3]  David Doermann,et al.  Text enhancement in digital video , 1999, Electronic Imaging.

[4]  Anil K. Jain,et al.  Automatic caption localization in compressed video , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[5]  Edward M. Riseman,et al.  Finding text in images , 1997, DL '97.

[6]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[7]  Rainer Lienhart,et al.  On the segmentation of text in videos , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[8]  David S. Doermann,et al.  Text enhancement in digital video using multiple frame integration , 1999, MULTIMEDIA '99.

[9]  Hao Jiang,et al.  Integrating visual, audio and text analysis for news video , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[10]  Xian-Sheng Hua,et al.  A video text detection and recognition system , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[11]  Anil K. Jain,et al.  Automatic Caption Localization in Compressed Video , 2000, IEEE Trans. Pattern Anal. Mach. Intell..