Combination of binarization and character segmentation using color information

Character segmentation and recognition have been performed for several decades, especially typewritten characters from scanner. Commercial OCR softwares perform well on "clean" documents or need user to select the kind of documents. Recently, a new kind of images taken by a camera in a "real-world" environment appeared. It implies different strong degradations missing in scanner-based pictures and the presence of complex backgrounds. In order to segment text as properly as possible, a new method is proposed using color information in order to extract text as well as possible. In this paper, a focus is given on each chosen parameter with comparative results between different recent techniques using color information. Moreover an emphasis is placed on stroke analysis and character segmentation. The binarization method takes it into account in order to improve character segmentation and recognition afterwards.

[1]  Sargur N. Srihari,et al.  Document Image Binarization Based on Texture Features , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  B. Gosselin,et al.  ROBUST THRESHOLDING BASED ON WAVELETS AND THINNING ALGORITHMS FOR DEGRADED CAMERA IMAGES , 2004 .

[3]  Bin Wang,et al.  Color text image binarization based on binary texture analysis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[5]  Chein-I Chang,et al.  Unsupervised approach to color video thresholding , 2004 .

[6]  Christopher R. Dance,et al.  Binarising camera images for OCR , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[7]  Bernard Gosselin,et al.  Segmentation-Based Binarization for Color Degraded Images , 2004, ICCVG.

[8]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[9]  Bülent Sankur,et al.  Survey over image thresholding techniques and quantitative performance evaluation , 2004, J. Electronic Imaging.

[10]  Yan Chen,et al.  Comparison of some thresholding algorithms for text/background segmentation in difficult document images , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[11]  Josef Kittler,et al.  Threshold selection based on a simple image statistic , 1985, Comput. Vis. Graph. Image Process..

[12]  C. Garcia,et al.  Text detection and segmentation in complex color images , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).