Extreme value theory based text binarization in documents and natural scenes

This paper presents a novel image binarization method that can deal with degradations such as shadows, non-uniform illumination, low-contrast, large signal-dependent noise, smear and strain. A pre-processing procedure based on morphological operations is first applied to suppress light/dark structures connected to image border. A novel binarization concept based on difference of gamma functions is presented. Next Generalized Extreme Value Distribution (GEVD) is used to find proper threshold for binarization with a significance level. Proposed method emphasizes on region of interest (with the help of morphological operations) and generates less noisy artifacts (due to GEVD). It is much simpler than other methods and works better on degraded documents and natural scene images

[1]  J. Lawless Statistical Models and Methods for Lifetime Data , 2002 .

[2]  Kongqiao Wang,et al.  Character location in scene images from digital camera , 2003, Pattern Recognit..

[3]  Manuel G. Scotto,et al.  Parameter estimation for the generalized extreme value distribution , 2001 .

[4]  Ioannis Pratikakis,et al.  ICDAR 2009 Document Image Binarization Contest (DIBCO 2009) , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[5]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Pattern Recognit..

[6]  Gérard G. Medioni,et al.  Text segmentation in color images using tensor voting , 2007, Image Vis. Comput..

[7]  Nobuyuki Otsu,et al.  ATlreshold Selection Method fromGray-Level Histograms , 1979 .

[8]  S. J. Perantonis,et al.  Detection in Indoor / Outdoor Scene Images , 2005 .

[9]  J. Pickands Statistical Inference Using Extreme Order Statistics , 1975 .

[10]  Shigeru Akamatsu,et al.  Recognizing Characters in Scene Images , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Lance Chun Che Fung,et al.  A Review of Evaluation of Optimal Binarization Technique for Character Segmentation in Historical Manuscripts , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[12]  D. Gamerman,et al.  Bayesian analysis of extreme events with threshold estimation , 2004 .

[13]  Chew Lim Tan,et al.  Adaptive Region Growing Color Segmentation for Text Using Irregular Pyramid , 2004, Document Analysis Systems.

[14]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[15]  Nikos Papamarkos,et al.  An Evaluation Technique for Binarization Algorithms , 2008, J. Univers. Comput. Sci..

[16]  André Marion,et al.  Introduction to Image Processing , 1990, Springer US.

[17]  Jonghyun Park,et al.  Korean Text Detection and Binarization in Color Signboards , 2008, 2008 International Conference on Advanced Language Processing and Web Information Technology.

[18]  Bülent Sankur,et al.  Survey over image thresholding techniques and quantitative performance evaluation , 2004, J. Electronic Imaging.

[19]  Bülent Sankur,et al.  The performance evaluation of thresholding algorithms for optical character recognition , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[20]  Eric P. Smith,et al.  An Introduction to Statistical Modeling of Extreme Values , 2002, Technometrics.

[21]  Gordon Johnston,et al.  Statistical Models and Methods for Lifetime Data , 2003, Technometrics.

[22]  Andrew K. C. Wong,et al.  A new method for gray-level picture thresholding using the entropy of the histogram , 1985, Comput. Vis. Graph. Image Process..

[23]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Nikos Papamarkos,et al.  An evaluation survey of binarization algorithms on historical documents , 2008, 2008 19th International Conference on Pattern Recognition.

[25]  S. Lucas,et al.  ICDAR 2003 robust reading competitions: entries, results, and future directions , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[26]  Øivind Due Trier,et al.  Evaluation of Binarization Methods for Document Images , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Nobuo Ezaki,et al.  Text detection from natural scene images: towards a system for visually impaired persons , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[28]  Pierre Soille,et al.  Morphological Image Analysis: Principles and Applications , 2003 .

[29]  Rama Chellappa,et al.  Adaptive Threshold Estimation via Extreme Value Theory , 2010, IEEE Transactions on Signal Processing.