A spatial-temporal approach for video caption detection and recognition

We present a video caption detection and recognition system based on a fuzzy-clustering neural network (FCNN) classifier. Using a novel caption-transition detection scheme we locate both spatial and temporal positions of video captions with high precision and efficiency. Then employing several new character segmentation and binarization techniques, we improve the Chinese video-caption recognition accuracy from 13% to 86% on a set of news video captions. As the first attempt on Chinese video-caption recognition, our experiment results are very encouraging.

[1]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[2]  Takafumi Miyatake,et al.  IMPACT: an interactive natural-motion-picture dedicated multimedia authoring system , 1991, CHI.

[3]  Akio Nagasaka,et al.  Automatic Video Indexing and Full-Video Search for Object Appearances , 1991, VDB.

[4]  Keinosuke Fukunaga,et al.  Statistical Pattern Recognition , 1993, Handbook of Pattern Recognition and Computer Vision.

[5]  James C. Bezdek,et al.  Generalized clustering networks and Kohonen's self-organizing scheme , 1993, IEEE Trans. Neural Networks.

[6]  Shigeru Akamatsu,et al.  Recognizing Characters in Scene Images , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Sankar K. Pal,et al.  Self-organizing neural network as a fuzzy classifier , 1994, IEEE Trans. Syst. Man Cybern..

[8]  Atreyi Kankanhalli,et al.  Automatic Extraction of Characters in Complex Scene Images , 1995, Int. J. Pattern Recognit. Artif. Intell..

[9]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[10]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Pattern Recognit..

[11]  Marco Campani,et al.  Robust method for road sign detection and recognition , 1996, Image Vis. Comput..

[12]  Takeo Kanade,et al.  Intelligent Access to Digital Video: Informedia Project , 1996, Computer.

[13]  Hang Joon Kim,et al.  A recognition of vehicle license plate using a genetic algorithm based segmentation , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[14]  Rainer Lienhart,et al.  Automatic text recognition in digital videos , 1995, Electronic Imaging.

[15]  Hae-Kwang Kim,et al.  Efficient Automatic Text Location Method and Content-Based Indexing and Structuring of Video Database , 1996, J. Vis. Commun. Image Represent..

[16]  Qian Huang,et al.  Character extraction of license plates from video , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Edward M. Riseman,et al.  Finding text in images , 1997, DL '97.

[18]  Andrew Merlino,et al.  Segmentation, Content Extraction and Visualization of Broadcast News Video using Multistream Analysis , 1997 .

[19]  Rama Chellappa,et al.  Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Shoji Kurakake,et al.  Recognition and visual feature matching of text region in video for conceptual indexing , 1997, Electronic Imaging.

[21]  Daniel P. Lopresti,et al.  Finding text in color images , 1998, Electronic Imaging.

[22]  Chitra Dorai,et al.  Automatic text extraction from video for content-based annotation and retrieval , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[23]  Takeo Kanade,et al.  Video skimming and characterization through the combination of image and language understanding , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[24]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[25]  David S. Doermann,et al.  Text Extraction, Enhancement and OCR in Digital Video , 1998, Document Analysis Systems.

[26]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[27]  Takeo Kanade,et al.  Video OCR: indexing digital news libraries by recognition of superimposed captions , 1999, Multimedia Systems.

[28]  David J. Crandall,et al.  A system for automatic text detection in video , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[29]  Yasuo Ariki,et al.  Telop and flip frame detection and character extraction from TV news articles , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[30]  Horst Bunke,et al.  Identification of text on colored book and journal covers , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[31]  Nevenka Dimitrova,et al.  Text detection for video analysis , 1999, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL'99).

[32]  Qian Huang,et al.  Automated generation of news content hierarchy by integrating audio, video, and text information , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[33]  Clement T. Yu,et al.  Techniques and Systems for Image and Video Retrieval , 1999, IEEE Trans. Knowl. Data Eng..

[34]  Hang Joon Kim,et al.  Neural network-based text location for news video indexing , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[35]  Edward M. Riseman,et al.  TextFinder: An Automatic System to Detect and Recognize Text In Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  David Zhang,et al.  A fuzzy clustering neural networks (FCNs) system design methodology , 2000, IEEE Trans. Neural Networks Learn. Syst..

[37]  C. Garcia,et al.  Text detection and segmentation in complex color images , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[38]  Rainer Lienhart,et al.  On the segmentation of text in videos , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[39]  Edward K. Wong,et al.  A robust algorithm for text extraction in color video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[40]  Minoru Mori,et al.  Telop-on-demand: video structuring and retrieval based on text recognition , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[41]  Yannis Avrithis,et al.  Broadcast news parsing using visual cues: a robust face detection approach , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[42]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..