Binarization of Color Characters in Scene Images Using k-means Clustering and Support Vector Machines

This paper proposes a new technique for binalizing multicolored characters subject to heavy degradations. The key ideas are threefold. The first is generation of tentatively binarized images via every dichotomization of k clusters obtained by k-means clustering in the HSI color space. The total number of tentatively binarized images equals 2^k−2. The second is use of support vector machines (SVM) to determine whether and to what degree each tentatively binarized image represents a character or non-character. We feed the SVM with mesh and weighted direction code histogram features to output the degree of “character-likeness.” The third is selection of a single binarized image with the maximum degree of “character likeness” as an optimal binarization result. Experiments using a total of 1000 single-character color images extracted from the ICDAR 2003 robust OCR dataset show that the proposed method achieves a correct binarization rate of 93.7%.

[1]  Tetsushi Wakabayashi,et al.  Improvement of handwritten Japanese character recognition using weighted direction code histogram , 1997, Pattern Recognit..

[2]  Adnan Amin,et al.  Automatic thresholding of gray-level using multistage approach , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  Kongqiao Wang,et al.  Character location in scene images from digital camera , 2003, Pattern Recognit..

[5]  Toru Wakahara,et al.  Binarization and Recognition of Degraded Characters Using a Maximum Separability Axis in Color Space and GAT Correlation , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[6]  Barry R. Masters,et al.  Digital Image Processing, Third Edition , 2009 .

[7]  David S. Doermann,et al.  Progress in camera-based document image analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[8]  Anil K. Jain,et al.  Goal-Directed Evaluation of Binarization Methods , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[10]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.