Automatic caption localization in videos using salient points

Broadcasters are demonstrating interest in building digital archives of their assets for reuse of archive materials for TV programs, on-line availability, and archiving. This requires tools for video indexing and retrieval by content exploiting high-level video information such as that contained in super-imposed text captions. In this paper we present a method to automatically detect and localize captions in digital video using temporal and spatial local properties of salient points in video frames. Results of experiments on both high-resolutionDV sequences and standard VHS videos are presented and discussed.

[1]  Anil K. Jain,et al.  Automatic caption localization in compressed video , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[2]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[3]  Edward K. Wong,et al.  A robust algorithm for text extraction in color video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[4]  David S. Doermann,et al.  Automatic identification of text in digital video key frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[5]  Rainer Lienhart,et al.  On the segmentation of text in videos , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[6]  Ellen K. Hughes,et al.  Video OCR for digital news archive , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[7]  Michael A. Smith,et al.  Video skimming and characterization through the combination of image and language understanding techniques , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[9]  Alberto Del Bimbo,et al.  Semantics in Visual Information Retrieval , 1999, IEEE Multim..

[10]  David S. Doermann,et al.  Text Extraction, Enhancement and OCR in Digital Video , 1998, Document Analysis Systems.

[11]  Boon-Lock Yeo,et al.  Visual content highlighting via automatic extraction of embedded captions on MPEG compressed video , 1996, Electronic Imaging.

[12]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[13]  Edward M. Riseman,et al.  TextFinder: An Automatic System to Detect and Recognize Text In Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Sung-Bae Cho,et al.  Geometric Structure Analysis of Document Images: A Knowledge-Based Approach , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Rainer Lienhart,et al.  Indexing and retrieval of digital video sequences based on automatic text recognition , 1997, MULTIMEDIA '96.