Automatic Video Text Localization and Recognition

Text in videos contains much semantic information that can be used for video indexing and summarization. In this paper, we design an integrated algorithm of locating horizontal text based on corner point detection and color clustering. First, we get candidate text regions by using the method based on corner point detection, and then identify candidate text regions and refine the bounding boxes by color clustering. Both the precision and recall rate of the new localization method are improved, and the processing time of the new method is less. On the aspect of locating accuracy, the new method gives tighter bounding boxes. We finally enhance the quality of the detected text region by multi-frame averaging and local thresholding. Our method can handle multi-language video text with complex background including a great range of font sizes and styles. The results after above steps can be directly processed by OCR system.

[1]  Laurence B. Milstein,et al.  On the performance of hybrid FEC/ARQ systems using rate compatible punctured turbo (RCPT) codes , 2000, IEEE Trans. Commun..

[2]  Anil K. Jain,et al.  Text segmentation using gabor filters for automatic document processing , 1992, Machine Vision and Applications.

[3]  Jack K. Wolf,et al.  Noiseless coding of correlated information sources , 1973, IEEE Trans. Inf. Theory.

[4]  Aaron D. Wyner,et al.  The rate-distortion function for source coding with side information at the decoder , 1976, IEEE Trans. Inf. Theory.

[5]  Rui Zhang,et al.  Wyner-Ziv coding of motion video , 2002, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002..

[6]  Edward M. Riseman,et al.  TextFinder: An Automatic System to Detect and Recognize Text In Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Hao Jiang,et al.  Integrating visual, audio and text analysis for news video , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[8]  Rangachar Kasturi,et al.  Locating uniform-colored text in video frames , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[9]  Lawrence O. Hall,et al.  Text extraction from color documents-clustering approaches in three and four dimensions , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[10]  Xian-Sheng Hua,et al.  Automatic location of text in video frames , 2001, MULTIMEDIA '01.

[11]  Stan Sclaroff,et al.  An appearance-based framework for 3D hand shape classification and camera viewpoint estimation , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[12]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 2004, International Journal of Computer Vision.

[13]  Soo-Chang Pei,et al.  Automatic text detection using multi-layer color quantization in complex color images , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[14]  Qian Huang,et al.  Automated generation of news content hierarchy by integrating audio, video, and text information , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[15]  Stephen M. Smith,et al.  SUSAN—A New Approach to Low Level Image Processing , 1997, International Journal of Computer Vision.

[16]  Korris Fu-Lai Chung,et al.  Hybrid Chinese/English text detection in images and video frames , 2002, Object recognition supported by user interaction for service robots.