Improved text overlay detection in videos using a fusion-based classifier

In this paper, classifier fusion is adopted to demonstrate improved performance for our text overlay detections in the NIST TREC-2002 video retrieval benchmark. A normalized ensemble fusion is explored to combine two text overlay detection models. The fusion incorporates normalization of confidence scores, aggregation via combiner function, and an optimize selection. The proposed fusion classifier resulted best out of 11 detectors submitted to the NIST text overlay detection benchmarking and its average precision performance is 227% of the second best detector in the benchmark.

[1]  Mika Rautiainen,et al.  Video Indexing and Retrieval at UMD , 2002, TREC.

[2]  Paul Over,et al.  The TREC-2002 Video Track Report , 2002, TREC.

[3]  Junyu Niu,et al.  FDU at TREC 2002: Filtering, Q&A, Web and Video Tasks , 2002, TREC.

[4]  Anil K. Jain,et al.  Automatic caption localization in compressed video , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[5]  Chitra Dorai,et al.  Automatic text extraction from video for content-based annotation and retrieval , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[6]  Takeo Kanade,et al.  Video OCR: indexing digital news libraries by recognition of superimposed captions , 1999, Multimedia Systems.

[7]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[8]  Alberto Del Bimbo,et al.  Automatic caption localization in videos using salient points , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[9]  Fabrice Souvannavong,et al.  Semantic Feature Extraction using Mpeg Macro-block Classification , 2002, TREC.

[10]  Wolfgang Effelsberg,et al.  Automatic text segmentation and text recognition for video indexing , 2000, Multimedia Systems.

[11]  Xian-Sheng Hua,et al.  Efficient video text recognition using multiple frame integration , 2002, Proceedings. International Conference on Image Processing.

[12]  C. Dorai,et al.  Accurate Overlay Text Extraction for Digital Video Analysis , 2003 .