A Spatio-temporal Approach for Video Caption Extraction

Captions in videos play an important role for video indexing and retrieval. In this paper, we propose a novel algorithm to extract multilingual captions from video. Our approach is based on the analysis of spatio-temporal slices of video. If the horizontal (or vertical) scan line contains some pixels of caption region then the corresponding spatio-temporal slice will have bar-code like patterns. By integrating the structure information of bar-code like patterns in horizontal and vertical slices, the spatial and temporal positions of video captions can be located accurately. Experimental results show that the proposed algorithm is effective and outperforms some existing techniques.

[1]  Rongrong Wang,et al.  A novel video caption detection approach using multi-frame integration , 2004, ICPR 2004.

[2]  Michael R. Lyu,et al.  A comprehensive method for multilingual video text detection, localization, and extraction , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Tsung-Han Tsai,et al.  A Comprehensive Motion Videotext Detection Localization and Extraction Method , 2006, ICCCAS 2006.

[4]  Palaiahnakote Shivakumara,et al.  Efficient video text detection using edge features , 2008, 2008 19th International Conference on Pattern Recognition.

[5]  Rangachar Kasturi,et al.  Locating uniform-colored text in video frames , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[6]  He Huang,et al.  A method of caption location and segmentation in news video , 2014, 2014 7th International Congress on Image and Signal Processing.

[7]  Xian-Sheng Hua,et al.  Automatic location of text in video frames , 2001, MULTIMEDIA '01.

[8]  Xinbo Gao,et al.  A spatial-temporal approach for video caption detection and recognition , 2002, IEEE Trans. Neural Networks.

[9]  Chong-Wah Ngo,et al.  Video partitioning by temporal slice coherency , 2001, IEEE Trans. Circuits Syst. Video Technol..

[10]  Wen Gao,et al.  Fast and robust text detection in images and video frames , 2005, Image Vis. Comput..

[11]  Xueming Qian,et al.  Text detection, localization, and tracking in compressed video , 2007, Signal Process. Image Commun..

[12]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[13]  Jean-Marc Odobez,et al.  Text detection, recognition in images and video frames , 2004, Pattern Recognit..

[14]  Luis Miguel Bergasa,et al.  Text location in complex images , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[15]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[16]  Anil K. Jain,et al.  Automatic caption localization in compressed video , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[17]  Hang Joon Kim,et al.  Support vector machine-based text detection in digital video , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[18]  Yuan-Kai Wang,et al.  Detecting Video Texts Using Spatial-Temporal Wavelet Transform , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[19]  Yoshinobu Hotta,et al.  A Fast Caption Detection Method for Low Quality Video Images , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[20]  Weiqiang Wang,et al.  Extracting Captions in Complex Background from Videos , 2010, 2010 20th International Conference on Pattern Recognition.

[21]  Cheng-Lin Liu,et al.  A Hybrid Approach to Detect and Localize Texts in Natural Scene Images , 2011, IEEE Transactions on Image Processing.