An automatic performance evaluation protocol for video text detection algorithms

Text presented in videos provides important supplemental information for video indexing and retrieval. Many efforts have been made for text detection in videos. However, there is still a lack of performance evaluation protocols for video text detection. In this paper, we propose an objective and comprehensive performance evaluation protocol for video text detection algorithms. The protocol includes a positive set and a negative set of indices at the textbox level, which evaluate the detection quality in terms of both location accuracy and fragmentation of the detected textboxes. In the protocol, we assign a detection difficulty (DD) level to each ground truth textbox. The performance indices can then be normalized with respect to the textbox DD level and are therefore tolerant to different ground-truth difficulties to a certain degree. We also assign a detectability index (DI) value to each ground-truth textbox. The overall detection rate is the DI-weighted average of the detection qualities of all ground-truth textboxes, which makes the detection rate more accurate to reveal the real performance. The automatic performance evaluation scheme has been applied to performance evaluation of a text detection approach to determine the best thresholds that can yield the best detection results. The protocol has also been employed to compare the performances of several text detection systems. Hence, we believe that the proposed protocol can be used to compare the performance of different video/image text detection algorithms/systems and can even help improve, select, and design new text detection methods.

[1]  Kevin W. Bowyer,et al.  Introduction to the Special Section on Empirical Evaluation of Computer Vision Algorithms , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  S. M. Steve SUSAN - a new approach to low level image processing , 1997 .

[3]  Wolfgang Effelsberg,et al.  Automatic text segmentation and text recognition for video indexing , 2000, Multimedia Systems.

[4]  Dov Dori,et al.  A Proposed Scheme for Performance Evaluation of Graphics/Text Separation Algorithms , 1997, GREC.

[5]  Edward M. Riseman,et al.  TextFinder: An Automatic System to Detect and Recognize Text In Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Li Zhao,et al.  Video shot grouping using best-first model merging , 2001, IS&T/SPIE Electronic Imaging.

[7]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[8]  Dov Dori,et al.  A protocol for performance evaluation of line detection algorithms , 1997, Machine Vision and Applications.

[9]  Rainer Lienhart,et al.  On the segmentation of text in videos , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[10]  Anil K. Jain,et al.  Automatic caption localization in compressed video , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[11]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[12]  Xian-Sheng Hua,et al.  A video text detection and recognition system , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[13]  Edward M. Riseman,et al.  Finding text in images , 1997, DL '97.

[14]  EffelsbergWolfgang,et al.  Automatic text segmentation and text recognition for video indexing , 2000 .

[15]  Xian-Sheng Hua,et al.  Automatic location of text in video frames , 2001, MULTIMEDIA '01.

[16]  Stephen M. Smith,et al.  SUSAN—A New Approach to Low Level Image Processing , 1997, International Journal of Computer Vision.

[17]  Dov Dori,et al.  Principles of Constructing a Performance Evaluation Protocol for Graphics Recognition Algorithms , 1998, Theoretical Foundations of Computer Vision.

[18]  Hao Jiang,et al.  Integrating visual, audio and text analysis for news video , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[19]  Ching Y. Suen,et al.  Evaluation of thinning algorithms from an OCR viewpoint , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[20]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[21]  Robert M. Haralick,et al.  A Performance Evaluation Protocol for Graphics Recognition Systems , 1997, GREC.

[22]  Xian-Sheng Hua,et al.  Automatic performance evaluation for video text detection , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.