Locating text based on connected component and SVM

This paper presents a novel connected component based method for locating text in complex background using support vector machine (SVM). Our method is composed of two stages. In the first stage, the cascade of threshold classifiers and support vector machine are used to identify characters. In the second stage, the identified characters are combined into texts, and then text features are extracted and used to identify text region. Two kinds of features which are character features and text features are utilized to locate text region. Character features are used to discriminate character connected components (CCs) from other objects in complex background. Text features describe the characteristics that characters in the same text have same size, color and font. The cascade of threshold classifiers can discard most non-character object, and improve the efficiency of character feature extraction. SVM is used to identify characters which the cascade of threshold classifiers can not identify. Experimental results demonstrate that the proposed approach is robust with respect to different character sizes, colors and languages, and achieves high precision which measured on the ICDAR 2003 test database.

[1]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[2]  S. Lucas,et al.  ICDAR 2003 robust reading competitions: entries, results, and future directions , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[3]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[4]  Shigeru Akamatsu,et al.  Recognizing Characters in Scene Images , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  S.M. Lucas,et al.  ICDAR 2005 text locating competition results , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[6]  Tan Yee Fan,et al.  A Tutorial on Support Vector Machine , 2009 .

[7]  Jin Hyung Kim,et al.  Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Ching Y. Suen,et al.  A fast parallel algorithm for thinning digital patterns , 1984, CACM.