We present a method for performing optical character recognition (OCR) of text in video images. Recognition of videotext is a challenging problem due to various factors such as the presence of rich, dynamic backgrounds, low resolution, color, etc. Our strategy is to process the video images to produce high-resolution binarized text images that resemble printed text. We describe a novel clustering and relaxation procedure that combines stroke and color information to separate the text from the background. The binarized text image is then recognized with our Byblos OCR engine (Natarajan et al., 2001; Schwartz et al., 1996) using hidden Markov models trained on similar data. We present experimental results on a video-data corpus collected from broadcast news programs. Currently the system delivers a character error rate of 8.3% on independent multi-font test data from this corpus.
[1]
Richard M. Schwartz,et al.
Multilingual Machine Printed OCR
,
2001,
Int. J. Pattern Recognit. Artif. Intell..
[2]
Christopher Raphael,et al.
Language-independent OCR using a continuous speech recognition system
,
1996,
Proceedings of 13th International Conference on Pattern Recognition.
[3]
Takeo Kanade,et al.
Video OCR: indexing digital news libraries by recognition of superimposed captions
,
1999,
Multimedia Systems.
[4]
David S. Doermann,et al.
Automatic text detection and tracking in digital video
,
2000,
IEEE Trans. Image Process..
[5]
Michael I. Miller,et al.
REPRESENTATIONS OF KNOWLEDGE IN COMPLEX SYSTEMS
,
1994
.
[6]
Philip A. Chou,et al.
Document Image Decoding Using Markov Source Models
,
1994,
IEEE Trans. Pattern Anal. Mach. Intell..