Extraction of special effects caption text events from digital video

Abstract. The popularity of digital video is increasing rapidly. To help users navigate libraries of video, algorithms that automatically index video based on content are needed. One approach is to extract text appearing in video, which often reflects a scene's semantic content. This is a difficult problem due to the unconstrained nature of general-purpose video. Text can have arbitrary color, size, and orientation. Backgrounds may be complex and changing. Most work so far has made restrictive assumptions about the nature of text occurring in video. Such work is therefore not directly applicable to unconstrained, general-purpose video. In addition, most work so far has focused only on detecting the spatial extent of text in individual video frames. However, text occurring in video usually persists for several seconds. This constitutes a text event that should be entered only once in the video index. Therefore it is also necessary to determine the temporal extent of text events. This is a non-trivial problem because text may move, rotate, grow, shrink, or otherwise change over time. Such text effects are common in television programs and commercials but so far have received little attention in the literature. This paper discusses detecting, binarizing, and tracking caption text in general-purpose MPEG-1 video. Solutions are proposed for each of these problems and compared with existing work found in the literature.

[1]  Joan L. Mitchell,et al.  MPEG Video Compression Standard , 1996, Springer US.

[2]  Rangachar Kasturi,et al.  Locating uniform-colored text in video frames , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[3]  Rainer Lienhart,et al.  Automatic text recognition for video indexing , 1997, MULTIMEDIA '96.

[4]  Stefano Messelodi,et al.  Automatic identification and skew estimation of text lines in real scene images , 1999, Pattern Recognition.

[5]  Yasuhiko Watanabe,et al.  TIVA Applications: Retrieving Related TV News Reports and Newspaper Articles , 1999, IEEE Intell. Syst..

[6]  Lowell L. Winger,et al.  Character segmentation and thresholding in low-contrast scene images , 1996, Electronic Imaging.

[7]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Rafael C. González,et al.  Local Determination of a Moving Contrast Edge , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Wayne Nilback An introduction to digital image processing , 1985 .

[10]  Rainer Lienhart,et al.  Indexing and retrieval of digital video sequences based on automatic text recognition , 1997, MULTIMEDIA '96.

[11]  Seong-Whan Lee,et al.  Direct Extraction of Topographic Features for , 1995 .

[12]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[13]  Shigeru Akamatsu,et al.  Recognizing Characters in Scene Images , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  ZhuPengfei,et al.  On Critical Point Detection of Digital Shapes , 1995 .

[15]  David S. Doermann,et al.  Automatic text tracking in digital videos , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[16]  Aggelos K. Katsaggelos,et al.  Resolution enhancement of monochrome and color video using motion compensation , 2001, IEEE Trans. Image Process..

[17]  Ellen K. Hughes,et al.  Video OCR for digital news archive , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[18]  Pengfei Zhu,et al.  On Critical Point Detection of Digital Shapes , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Masaru Sugano,et al.  Moving-object detection from MPEG coded data , 1998, Electronic Imaging.

[20]  David J. Crandall,et al.  Robust extraction of text in video , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[21]  Mihaela van der Schaar-Mitrea Compression of mixed video and graphics images for TV systems , 1998 .

[22]  David S. Doermann,et al.  Tools and techniques for video performance evaluation , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[23]  Joan L. Mitchell,et al.  MPEG Video: Compression Standard , 1996 .

[24]  Atreyi Kankanhalli,et al.  Automatic Extraction of Characters in Complex Scene Images , 1995, Int. J. Pattern Recognit. Artif. Intell..

[25]  Chitra Dorai,et al.  Automatic text extraction from video for content-based annotation and retrieval , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[26]  Hang Joon Kim,et al.  Neural network-based text location for news video indexing , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[27]  Edward M. Riseman,et al.  Finding text in images , 1997, DL '97.

[28]  Mohamed S. Kamel,et al.  Extraction of Binary Character/Graphics Images from Grayscale Document Images , 1993, CVGIP Graph. Model. Image Process..

[29]  Narendra Ahuja,et al.  A fast scheme for altering resolution in the compressed domain , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[30]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[31]  Rainer Lienhart,et al.  Automatic text recognition in digital videos , 1995, Electronic Imaging.

[32]  Zhu Liu,et al.  Integration of multimodal features for video scene classification based on HMM , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[33]  A. Gupta,et al.  Text segmentation in mixed-mode images , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[34]  Sameer Antani,et al.  Reliable Extraction of Text from Video , 2001 .

[35]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[36]  Frank Lebourgeois Robust multifont OCR system from gray level images , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[37]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[38]  C. Garcia,et al.  Text detection and segmentation in complex color images , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[39]  Ullas Gargi,et al.  Indexing text events in digital video databases , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[40]  Osamu Hori A video text extraction method for character recognition , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[41]  Maurizio Pilu Using raw MPEG motion vectors to determine global camera motion , 1998, Electronic Imaging.

[42]  C. S. Shin,et al.  Support vector machine-based text detection in digital video , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[43]  Edward K. Wong,et al.  A robust algorithm for text extraction in color video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[44]  Nevenka Dimitrova,et al.  Text detection for video analysis , 1999, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL'99).

[45]  Robert M. Haralick,et al.  Performance evaluation of document layout analysis algorithms on the UW data set , 1997, Electronic Imaging.

[46]  Hao Jiang,et al.  Integrating visual, audio and text analysis for news video , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).