Caption analysis and recognition for building video indexing systems

Abstract.In this paper, we propose several methods for analyzing and recognizing Chinese video captions, which constitute a very useful information source for video content. Image binarization, performed by combining a global threshold method and a window-based method, is used to obtain clearer images of characters, and a caption-tracking scheme is used to locate caption regions and detect caption changes. The separation of characters from possibly complex backgrounds is achieved by using size and color constraints and by cross examination of multiframe images. To segment individual characters, we use a dynamic split-and-merge strategy. Finally, we propose a character recognition process using a prototype classification method, supplemented by a disambiguation process using support vector machines, to improve recognition outcomes. This is followed by a postprocess that integrates multiple recognition results. The overall accuracy rate for the entire process applied to test video films is 94.11%.

[1]  Michael A. Smith,et al.  Video skimming and characterization through the combination of image and language understanding techniques , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  David S. Doermann,et al.  Progress in camera-based document image analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[4]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[5]  Wolfgang Effelsberg,et al.  Automatic text segmentation and text recognition for video indexing , 2000, Multimedia Systems.

[6]  Seong-Whan Lee,et al.  A new methodology for gray-scale character segmentation and recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[7]  Edward K. Wong,et al.  A robust algorithm for text extraction in color video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[8]  Rainer Lienhart,et al.  VIDEO OCR: A SURVEY AND PRACTITIONER'S GUIDE , 2003 .

[9]  Wen-Liang Hwang,et al.  Binarization of document images using Hadamard multiresolution analysis , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[10]  David S. Doermann,et al.  Text enhancement in digital video using multiple frame integration , 1999, MULTIMEDIA '99.

[11]  Takeshi Mita,et al.  Improvement of video text recognition by character selection , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[12]  Edward M. Riseman,et al.  TextFinder: An Automatic System to Detect and Recognize Text In Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Xian-Sheng Hua,et al.  Efficient video text recognition using multiple frame integration , 2002, Proceedings. International Conference on Image Processing.

[14]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[15]  Fu Chang Retrieving information from document images: problems and solutions , 2001, International Journal on Document Analysis and Recognition.

[16]  John Platt,et al.  Large Margin DAG's for Multiclass Classification , 1999 .

[17]  Clement T. Yu,et al.  Techniques and Systems for Image and Video Retrieval , 1999, IEEE Trans. Knowl. Data Eng..

[18]  Toby Berger,et al.  Reliable On-Line Human Signature Verification Systems , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[20]  Chun-Jen Chen,et al.  A linear-time component-labeling algorithm using contour tracing technique , 2004, Comput. Vis. Image Underst..

[21]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[22]  Minoru Mori,et al.  Telop-on-demand: video structuring and retrieval based on text recognition , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[23]  Hsin-Hsi Chen,et al.  A Simple Method for Chinese Video OCR and Its Application to Question Answering , 2001, Int. J. Comput. Linguistics Chin. Lang. Process..

[24]  Takeo Kanade,et al.  Video OCR: indexing digital news libraries by recognition of superimposed captions , 1999, Multimedia Systems.

[25]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[26]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[27]  Yi Lu,et al.  Machine printed character segmentation --; An overview , 1995, Pattern Recognit..

[28]  Gérard Dreyfus,et al.  Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[29]  Chitra Dorai,et al.  Automatic text extraction from video for content-based annotation and retrieval , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[30]  H. Kamada,et al.  High-speed, high-accuracy binarization method for recognizing text in images of low spatial resolutions , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[31]  Kwang In Kim,et al.  A video indexing system using character recognition , 2000, 2000 Digest of Technical Papers. International Conference on Consumer Electronics. Nineteenth in the Series (Cat. No.00CH37102).

[32]  David J. Crandall,et al.  Robust extraction of text in video , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.