Automatic text regions location in video frames

Content-based information retrieval from digital video databases and media archives is a challenging problem and is rapidly gaining widespread research and commercial interest. For a reliable retrieval and intelligent access to video programs, indexing should provide semantic descriptors. One way to include more semantic knowledge into the indexing process is to use the text embedded within images and video sequences programs such as credit titles, ellipse, etc. Text in video is rich in information and easy to use, e.g. by key word based queries. In this paper we propose an automatic text regions location technique in digital video frames. The detected text boxes can then be passed to standard commercial OCR software to obtain the full texts used in the video indexing purpose. Our method makes use of four main techniques in image processing, that is an adaptive binarization, multi-resolution, histogram segmentation and morphologic operations to locate text regions. A new technique for histogram segmentation based on Optimum thresholding is then proposed. The quality of localized text is improved by experimental results that we have driven on a large sample of video frames selected from various kinds of video programs (commercials, TV news, full-length films, etc.). Finally, the results of text regions localization are presented. This work is subscribed among the CMCU Project undertaken

[1]  Edward M. Riseman,et al.  Finding text in images , 1997, DL '97.

[2]  Marc Davis,et al.  Media streams: representing video for retrieval and repurposing , 1994, MULTIMEDIA '94.

[3]  Philippe Aigrain,et al.  Representation-based user interfaces for the audiovisual library of the year 2000 , 1995, Electronic Imaging.

[4]  Liming Chen,et al.  Improvement of shot detection methods based on dynamic threshold selection , 1997, Other Conferences.

[5]  Ramin Zabih,et al.  Comparing images using color coherence vectors , 1997, MULTIMEDIA '96.

[6]  Horst Bunke,et al.  Identification of text on colored book and journal covers , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[7]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[8]  Hao Yan,et al.  Automatic Text Detection In Video Frames Based on Bootstrap Artificial Neural Network and CED , 2003, WSCG.

[9]  Ellen K. Hughes,et al.  Video OCR for Digital News Archives , 1998 .

[10]  Philippe Aigrain,et al.  Representation-based user interfaces for the audiovisual library of the year 2000 , 1995, Electronic imaging.

[11]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[12]  Hao Jiang,et al.  Integrating visual, audio and text analysis for news video , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[13]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[14]  Jean-Michel Jolion,et al.  Text localization, enhancement and binarization in multimedia documents , 2002, Object recognition supported by user interaction for service robots.

[15]  Ellen K. Hughes,et al.  Video OCR for digital news archive , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[16]  Takeo Kanade,et al.  Video OCR: indexing digital news libraries by recognition of superimposed captions , 1999, Multimedia Systems.

[17]  E. Petajan,et al.  An improved automatic lipreading system to enhance speech recognition , 1988, CHI '88.

[18]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..