Text detection, localization, and tracking in compressed video

Video text information plays an important role in semantic-based video analysis, indexing and retrieval. Video texts are closely related to the content of a video. Usually, the fundamental steps of text-based video analysis, browsing and retrieval consist of video text detection, localization, tracking, segmentation and recognition. Video sequences are commonly stored in compressed formats where MPEG coding techniques are often adopted. In this paper, a unified framework for text detection, localization, and tracking in compressed videos using the discrete cosines transform (DCT) coefficients is proposed. A coarse to fine text detection method is used to find text blocks in terms of the block DCT texture intensity information. The DCT texture intensity of an 8x8 block of an intra-frame is approximately represented by seven AC coefficients. The candidate text block regions are further verified and refined. The text block region localization and tracking are carried out by virtue of the horizontal and vertical block texture intensity projection profiles. The appearing and disappearing frames of each text line are determined by the text tracking. The final experimental results show the effectiveness of the proposed methods.

[1]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[2]  Jean-Marc Odobez,et al.  Text detection, recognition in images and video frames , 2004, Pattern Recognit..

[3]  Xueming Qian,et al.  Text Detection, Localization and Segmentation in Compressed Videos , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Xueming Qian,et al.  Effective Fades and Flashlight Detection Based on Accumulating Histogram Difference , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Qian Huang,et al.  Character extraction of license plates from video , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  C. Garcia,et al.  Text detection and segmentation in complex color images , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7]  David S. Doermann,et al.  Text enhancement in digital video using multiple frame integration , 1999, MULTIMEDIA '99.

[8]  NgoChong-Wah,et al.  Video text detection and segmentation for optical character recognition , 2005 .

[9]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[10]  James O. Berger,et al.  Selecting a Minimax Estimator of a Multivariate Normal Mean , 1982 .

[11]  Rongrong Wang,et al.  A novel video caption detection approach using multi-frame integration , 2004, ICPR 2004.

[12]  Jintao Li,et al.  A Generic Framework for Semantic Sports Video Analysis Using Dynamic Bayesian Networks , 2005, 11th International Multimedia Modelling Conference.

[13]  Ahmet Ekin Local Information Based Overlaid Text Detection by Classifier Fusion , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  Bo Shen,et al.  Direct feature extraction from compressed images , 1996, Electronic Imaging.

[15]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[16]  Rangachar Kasturi,et al.  Locating uniform-colored text in video frames , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[17]  Xueming Qian,et al.  Global motion estimation from randomly selected motion vector groups and GM/LM based applications , 2007, Signal Image Video Process..

[18]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Pattern Recognit..

[19]  Michael R. Lyu,et al.  A comprehensive method for multilingual video text detection, localization, and extraction , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Nevenka Dimitrova,et al.  Text detection for video analysis , 1999, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL'99).

[21]  Hao Jiang,et al.  Integrating visual, audio and text analysis for news video , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[22]  Chong-Wah Ngo,et al.  Video text detection and segmentation for optical character recognition , 2005, Multimedia Systems.

[23]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[24]  Wallapak Tavanapong,et al.  Shot clustering techniques for story browsing , 2004, IEEE Transactions on Multimedia.

[25]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[26]  Peng Wang,et al.  A hybrid approach to news video classification multimodal features , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[27]  Hang Joon Kim,et al.  Automatic text detection and removal in video sequences , 2003, Pattern Recognit. Lett..

[28]  Wen Wu,et al.  Integrating co-training and recognition for text detection , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[29]  C. Villegas Maximum Likelihood and Least Squares Estimation in Linear and Affine Functional Models , 1982 .

[30]  Ellen K. Hughes,et al.  Video OCR for digital news archive , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[31]  Takeo Kanade,et al.  Video OCR: indexing digital news libraries by recognition of superimposed captions , 1999, Multimedia Systems.

[32]  Seong-Whan Lee,et al.  Text extraction in MPEG compressed video for content-based indexing , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[33]  Anil K. Jain,et al.  Automatic caption localization in compressed video , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[34]  Jean-Philippe Thiran,et al.  Text identification in complex background using SVM , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[35]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[36]  Ullas Gargi,et al.  Indexing text events in digital video databases , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[37]  Wen Gao,et al.  Fast and robust text detection in images and video frames , 2005, Image Vis. Comput..

[38]  Jean-Philippe Thiran,et al.  A localization/verification scheme for finding text in images and video frames based on contrast independent features and machine learning methods , 2004, Signal Process. Image Commun..

[39]  David J. Crandall,et al.  Robust detection of stylized text events in digital video , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[40]  J. Berger Minimax estimation of a multivariate normal mean under arbitrary quadratic loss , 1976 .

[41]  Edward M. Riseman,et al.  TextFinder: An Automatic System to Detect and Recognize Text In Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  David G. Stork,et al.  Pattern Classification , 1973 .

[43]  Xinbo Gao,et al.  A spatial-temporal approach for video caption detection and recognition , 2002, IEEE Trans. Neural Networks.

[44]  Marcel Worring,et al.  Multimedia event-based video indexing using time intervals , 2005, IEEE Transactions on Multimedia.