Binarization by Local K-means Clustering for Korean Text Extraction

Text information in a natural scene is very useful and important for understanding images. Detection and extraction of text information in such natural images have been used in many applications. Many conditions of natural scene make the problem of text segmentation quite intractable. In this paper, an effective method for the segmentation and binarization of Korean texts from signboard images is proposed, which is robust in blurred images, uneven illumination and strong boundary text images. The proposed approach is based on local K-means clustering on separate words in text region. Firstly, detected text region is divided into local areas with relatively uniform illumination, and then using 3-means clustering with Euclidean distance has been applied to segment text from the background. By dividing the region of interest into local areas, the effect of uneven lighting has been minimized. The comparison with Otsu's method and 2-means clustering based on intensity will be representation in some metrics. Natural images from the test database, collected from mobile devices, are used in the experiment and the results show the performance of the proposed method.

[1]  Edward M. Riseman,et al.  TextFinder: An Automatic System to Detect and Recognize Text In Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Andrew K. C. Wong,et al.  A new method for gray-level picture thresholding using the entropy of the histogram , 1985, Comput. Vis. Graph. Image Process..

[3]  Anil K. Jain,et al.  Segmentation of Document Images , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Christodoulos Chamzas,et al.  A new approach for the design of digital integrators , 1996 .

[5]  Sargur N. Srihari,et al.  Document Image Binarization Based on Texture Features , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Bin Wang,et al.  Color text image binarization based on binary texture analysis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[8]  Wen Gao,et al.  Fast and robust text detection in images and video frames , 2005, Image Vis. Comput..

[9]  Bernard Gosselin,et al.  Color binarization for complex camera-based images , 2005, IS&T/SPIE Electronic Imaging.

[10]  Bernard Gosselin,et al.  Color text extraction with selective metric-based clustering , 2007, Comput. Vis. Image Underst..

[11]  Jagath Samarabandu,et al.  A simple and fast text localization algorithm for indoor mobile robot navigation , 2005, IS&T/SPIE Electronic Imaging.

[12]  Wen Gao,et al.  A robust text detection algorithm in images and video frames , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[13]  Kongqiao Wang,et al.  Character location in scene images from digital camera , 2003, Pattern Recognit..

[14]  C. Garcia,et al.  Text detection and segmentation in complex color images , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[15]  Bernard Gosselin,et al.  Segmentation-Based Binarization for Color Degraded Images , 2004, ICCVG.

[16]  Volume Assp,et al.  ACOUSTICS. SPEECH. AND SIGNAL PROCESSING , 1983 .

[17]  Josef Kittler,et al.  Minimum error thresholding , 1986, Pattern Recognit..