Level Set Methodology for Tamil Document Image Binarization and Segmentation

most challenging task in OCR is getting the characters segmented properly. The accuracy of segmentation depends on the quality of the binarization technique applied. Binarization is the process of setting all intensity values greater than some threshold value to "on". It converts the document image into binary image as extracting text and eliminating the background. This process also removes the noise. The output of this process is used as input to image segmentation process. Conventionally separate methods are used for binarizarion and segmentation. In this paper we investigate the use of recently introduced convex optimization methods, selective local/global segmentation (SLGS) algorithm (16) and fast global minimization (FGM) algorithm (15) for simultaneous binarization and segmentation. Out of the two methods we tried out, one of them is found to be suitable for OCR task. The FGM algorithm provides an average accuracy of 89.97% for Tamil character segmentation. Keywordsvel Set, Active Contours, Binarization, Segmentation.

[1]  Xavier Bresson,et al.  Fast Global Minimization of the Active Contour/Snake Model , 2007, Journal of Mathematical Imaging and Vision.

[2]  Shigeru Akamatsu,et al.  Recognizing Characters in Scene Images , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Nikos Papamarkos,et al.  An Evaluation Technique for Binarization Algorithms , 2008, J. Univers. Comput. Sci..

[4]  Lance Chun Che Fung,et al.  A Review of Evaluation of Optimal Binarization Technique for Character Segmentation in Historical Manuscripts , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[5]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[6]  Bülent Sankur,et al.  The performance evaluation of thresholding algorithms for optical character recognition , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[7]  Øivind Due Trier,et al.  Evaluation of Binarization Methods for Document Images , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  André Marion,et al.  Introduction to Image Processing , 1990, Springer US.

[9]  Chew Lim Tan,et al.  Adaptive Region Growing Color Segmentation for Text Using Irregular Pyramid , 2004, Document Analysis Systems.

[10]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[11]  Jonghyun Park,et al.  Korean Text Detection and Binarization in Color Signboards , 2008, 2008 International Conference on Advanced Language Processing and Web Information Technology.

[12]  Anthony J. Yezzi,et al.  Gradient flows and geometric active contour models , 1995, Proceedings of IEEE International Conference on Computer Vision.

[13]  Lei Zhang,et al.  Active contours with selective local or global segmentation: A new formulation and level set method , 2010, Image Vis. Comput..

[14]  Andrew K. C. Wong,et al.  A new method for gray-level picture thresholding using the entropy of the histogram , 1985, Comput. Vis. Graph. Image Process..

[15]  Nikos Papamarkos,et al.  An evaluation survey of binarization algorithms on historical documents , 2008, 2008 19th International Conference on Pattern Recognition.

[16]  Alain Trémeau,et al.  Extreme value theory based text binarization in documents and natural scenes , 2010 .

[17]  Bülent Sankur,et al.  Survey over image thresholding techniques and quantitative performance evaluation , 2004, J. Electronic Imaging.

[18]  Mohamed Cheriet,et al.  A local linear level set method for the binarization of degraded historical document images , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[19]  V. Caselles,et al.  A geometric model for active contours in image processing , 1993 .

[20]  Xavier Bresson April A Short Guide on a Fast Global Minimization Algorithm for Active Contour Models , 2009 .

[21]  A.W.M. Smeulders,et al.  An introduction to image processing , 1991 .