论文信息 - Detecting both superimposed and scene text with multiple languages and multiple alignments in video

Detecting both superimposed and scene text with multiple languages and multiple alignments in video

Video text often contains highly useful semantic information that can contribute significantly to video retrieval and understanding. Video text can be classified into scene text and superimposed text. Most of the previous methods detect superimposed or scene text separately due to different text alignments. Moreover, because different language characters have different edge and texture features, it is very difficult to detect the multilingual text. In this paper, we first perform a detailed analysis of motion patterns of video text, and show that the superimposed and scene text exhibit different motion patterns on consecutive frames, which is insensitive to multiple language characters and multiple text alignments. Based on our analysis, we define Motion Perception Field (MPF) to represent the text motion patterns. Finally, we propose a text detection algorithms using MPF for both superimposed and scene text with multiple languages and multiple alignments. Experimental results on diverse videos demonstrate that our algorithms are robust, and outperform previous methods for detecting both superimposed and scene texts with multiple languages and multiple alignments.

Charles X. Ling | Huadong Ma | Xiaodong Huang | Guangyu Gao

[1] Aya Soffer. Image categorization using texture features , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[2] Rangachar Kasturi,et al. Locating uniform-colored text in video frames , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[3] Wen Gao,et al. Coarse-to-fine video text detection , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[4] Michael R. Lyu,et al. A comprehensive method for multilingual video text detection, localization, and extraction , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[5] Christopher G. Harris,et al. A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[6] Yuxin Peng,et al. Color-based clustering for text detection and extraction in image , 2007, ACM Multimedia.

[7] Hideaki Goto. Redefining the DCT-based feature for scene text detection , 2008, International Journal of Document Analysis and Recognition (IJDAR).

[8] Qingming Huang,et al. A New Text Detection Algorithm in Images/Video Frames , 2004, PCM.

[9] Takeo Kanade,et al. Video OCR: indexing digital news libraries by recognition of superimposed captions , 1999, Multimedia Systems.

[10] Palaiahnakote Shivakumara,et al. 2009 10th International Conference on Document Analysis and Recognition A Gradient Difference based Technique for Video Text Detection , 2022 .

[11] Yuan-Kai Wang,et al. Detecting Video Texts Using Spatial-Temporal Wavelet Transform , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[12] Xian-Sheng Hua,et al. Efficient video text recognition using multiple frame integration , 2002, Proceedings. International Conference on Image Processing.

[13] Berthold K. P. Horn. Robot vision , 1986, MIT electrical engineering and computer science series.

[14] David J. Fleet,et al. Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[15] Silvano Di Zenzo,et al. A note on the gradient of a multi-image , 1986, Comput. Vis. Graph. Image Process..

[16] Jin Hyung Kim,et al. Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[17] David S. Doermann,et al. Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[18] Xian-Sheng Hua,et al. Automatic location of text in video frames , 2001, MULTIMEDIA '01.

[19] Huadong Ma,et al. A Novel Video Text Detection and Localization Approach , 2008, PCM.

[20] Kye Kyung Kim,et al. Scene text extraction in natural scene images using hierarchical feature combining and verification , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[21] David S. Doermann,et al. A video text detection system based on automated training , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[22] Ajit Singh,et al. Optic flow computation : a unified perspective , 1991 .

[23] Xilin Chen,et al. Automatic detection and recognition of signs from natural scenes , 2004, IEEE Transactions on Image Processing.

[24] T.C.E. Cheng,et al. Batching in a two-stage flowshop with dedicated machines in the second stage , 2004 .

[25] Rongrong Wang,et al. A novel video caption detection approach using multi-frame integration , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[26] John S. Boreczky,et al. A hidden Markov model framework for video segmentation using audio and image features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[27] Jiang Gao,et al. An adaptive algorithm for text detection from natural scenes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[28] Lowell L. Winger,et al. Low-Complexity Character Extraction in Low-Contrast Scene Images , 2000, Int. J. Pattern Recognit. Artif. Intell..

[29] Beom-Joon Cho,et al. Locating characters in scene images using frequency features , 2002, Object recognition supported by user interaction for service robots.