Automatic Caption Localization in Compressed Video

We present a method to automatically localize captions in JPEG compressed images and the I-frames of MPEG compressed videos. Caption text regions are segmented from background images using their distinguishing texture characteristics. Unlike previously published methods which fully decompress the video sequence before extracting the text regions, this method locates candidate caption text regions directly in the DCT compressed domain using the intensity variation information encoded in the DCT domain. Therefore, only a very small amount of decoding is required. The proposed algorithm takes about 0.006 second to process a 240/spl times/350 image and achieves a recall rate of 99.17 percent while falsely accepting about 1.87 percent nontext DCT blocks on a variety of MPEG compressed videos containing more than 2,300 I-frames.

[1]  Stephen W. Smoliar,et al.  Video parsing, retrieval and browsing: an integrated and content-based solution , 1997, MULTIMEDIA '95.

[2]  Chitra Dorai,et al.  Automatic text extraction from video for content-based annotation and retrieval , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[3]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1991, CACM.

[4]  Stephen W. Smoliar,et al.  Developing power tools for video indexing and retrieval , 1994, Electronic Imaging.

[5]  Boon-Lock Yeo,et al.  Visual content highlighting via automatic extraction of embedded captions on MPEG compressed video , 1996, Electronic Imaging.

[6]  Seong-Whan Lee,et al.  A New Methodology for Gray-Scale Character Segmentation and Recognition , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Shigeru Akamatsu,et al.  Recognizing Characters in Scene Images , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Pattern Recognit..

[9]  I. K. Sethi,et al.  Convolution-Based Edge Detection for Image/Video in Block DCT Domain , 1996, J. Vis. Commun. Image Represent..

[10]  Anil K. Jain,et al.  Page segmentation using tecture analysis , 1996, Pattern Recognit..

[11]  Ullas Gargi,et al.  Indexing text events in digital video databases , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[12]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[13]  Michael A. Smith,et al.  Video skimming and characterization through the combination of image and language understanding techniques , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  D. Legall,et al.  MPEG : A video compression standard for multimedia applications , 1991 .

[15]  Rainer Lienhart,et al.  Automatic text recognition in digital videos , 1995, Electronic Imaging.

[16]  Scott Stevens,et al.  Informedia digital video library , 1994, MULTIMEDIA '94.

[17]  Edward M. Riseman,et al.  Finding text in images , 1997, DL '97.

[18]  Takeo Kanade,et al.  Intelligent Access to Digital Video: Informedia Project , 1996, Computer.

[19]  Didier Le Gall,et al.  MPEG: a video compression standard for multimedia applications , 1991, CACM.

[20]  Nilesh V. Patel,et al.  Statistical approach to scene change detection , 1995, Electronic Imaging.