Caption text location with combined features using SVM

News caption text contains useful information for video annotation, indexing and searching. This paper presents a new caption text location method. First, a small overlapped sliding window is scanned over the keyframe. Then texture and edge features are extracted as the input to SVM classifier to distinguish caption text from background. At last, vote mechanism and morphological filter are performed to precisely locate the caption text region. The new method is expected to outperform the existing strategies based on the following two improvements. One is to combine texture-based method and edge-based method to make the algorithm more robust to complex backgrounds and various font styles. The other is to address the multilingual capability over the whole processing. The proposed algorithm has been evaluated by four different TV channels and the experiments show its high performance.

[1]  Jiang Wu,et al.  Automatic text detection in complex color image , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[2]  C. Garcia,et al.  Text detection and segmentation in complex color images , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Michael R. Lyu,et al.  A comprehensive method for multilingual video text detection, localization, and extraction , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Yuan-Kai Wang,et al.  Detecting Video Texts Using Spatial-Temporal Wavelet Transform , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[5]  Yuh-Jye Lee,et al.  SSVM: A Smooth Support Vector Machine for Classification , 2001, Comput. Optim. Appl..

[6]  Anil K. Jain,et al.  Automatic caption localization in compressed video , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).