论文信息 - General and domain-specific techniques for detecting and recognizing superimposed text in video

General and domain-specific techniques for detecting and recognizing superimposed text in video

We have developed generic and domain-specific video algorithms for caption text extraction and recognition in digital video. Our system includes several unique features: for caption box location, we combine the compressed-domain features derived from DCT coefficients and motion vectors. Long-term temporal consistency is employed to enhance localization performance. For character segmentation, we use a single-pass threshold free approach combining classification and projection to address noisy segmentation, text intensity variation, and algorithm complexity. In recognition, we use Zernike moments to achieve more accurate recognition performance. Finally, domain knowledge is explored and a statistical transition graph model is used to enhance recognition of domain-specific characters, such as ball counts and game score of baseball videos. The algorithms achieved real-time speed and significantly improved recognition accuracy. Furthermore, although the experiments were conducted in baseball videos only, these algorithms (except the transition model) are general and can be used in other applications, such as news and films.

Shih-Fu Chang | Raj Kumar Rajendran | DongQing Zhang

[1] Wolfgang Effelsberg,et al. Automatic text segmentation and text recognition for video indexing , 2000, Multimedia Systems.

[2] Alberto Del Bimbo,et al. Automatic caption localization in videos using salient points , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[3] Anil K. Jain,et al. Automatic caption localization in compressed video , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[4] Takeo Kanade,et al. Video OCR: indexing digital news libraries by recognition of superimposed captions , 1999, Multimedia Systems.

[5] David S. Doermann,et al. Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[6] Alireza Khotanzad,et al. Invariant Image Recognition by Zernike Moments , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[7] Anil K. Jain,et al. Feature extraction methods for character recognition-A survey , 1996, Pattern Recognit..

[8] EffelsbergWolfgang,et al. Automatic text segmentation and text recognition for video indexing , 2000 .

[9] Anil K. Jain,et al. Automatic Caption Localization in Compressed Video , 2000, IEEE Trans. Pattern Anal. Mach. Intell..