A new Tsallis entropy-based thresholding algorithm for images of historical documents

It is presented in this paper an algorithm for thresholding images of historical documents. The main objective is to generate high quality monochromatic images in order to make them easily accessible thru Internet and achieve high recognition rates by Optical Character Recognition algorithms. Our new algorithm is based on the classical entropy concept and a variation defined by the Tsallis Entropy and it proved to be more efficient than classical thresholding algorithms. The images generated are analyzed using precision, recall, accuracy and specificity.

[1]  Chew Lim Tan,et al.  Restoration of Archival Documents Using a Wavelet Technique , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Gang Li,et al.  Image Segmentation based on Tsallis-entropy and Renyi-entropy and Their Comparison , 2006, 2006 4th IEEE International Conference on Industrial Informatics.

[3]  Chun-hung Li,et al.  Minimum cross entropy thresholding , 1993, Pattern Recognit..

[4]  Jim R. Parker,et al.  Algorithms for image processing and computer vision , 1996 .

[5]  Efstathios Stamatatos,et al.  Improving the quality of degraded document images , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[6]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[7]  C. V. Jawahar,et al.  Investigations on fuzzy thresholding based on fuzzy clustering , 1997, Pattern Recognit..

[8]  Chris A. Glasbey,et al.  An Analysis of Histogram-Based Thresholding Algorithms , 1993, CVGIP Graph. Model. Image Process..

[9]  Carlos A. B. Mello,et al.  Optical Digit Recognition for Images of Handwritten Historical Documents , 2006, 2006 Ninth Brazilian Symposium on Neural Networks (SBRN'06).

[10]  Josef Kittler,et al.  Minimum error thresholding , 1986, Pattern Recognit..

[11]  Fan Xiaoping,et al.  An Application of Tsallis Entropy Minimum Difference on Image Segmentation , 2006, 2006 6th World Congress on Intelligent Control and Automation.

[12]  Moon-Soo Chang,et al.  Improved binarization algorithm for document image by histogram and edge detection , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[13]  Gilson A. Giraldi,et al.  Using Tsallis entropy into a Bayesian network for CBIR , 2005, IEEE International Conference on Image Processing 2005.

[14]  Jagat Narain Kapur,et al.  Measures of information and their applications , 1994 .

[15]  Mao-Jiun J. Wang,et al.  Image thresholding by minimizing the measures of fuzzines , 1995, Pattern Recognit..

[16]  Apostolos Antonacopoulos,et al.  Flexible Text Recovery from Degraded Typewritten Historical Documents , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[17]  R. Yager ON THE MEASURE OF FUZZINESS AND NEGATION Part I: Membership in the Unit Interval , 1979 .

[18]  William A. Barrett,et al.  Separating lines of text in free-form handwritten historical documents , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[19]  Chew Lim Tan,et al.  Removal of interfering strokes in double-sided document images , 2000, Proceedings Fifth IEEE Workshop on Applications of Computer Vision.

[20]  Thierry Pun,et al.  Entropic thresholding, a new approach , 1981 .

[21]  Carlos A. B. Mello,et al.  IMAGE THRESHOLDING OF HISTORICAL DOCUMENTS : APPLICATION TO THE JOAQUIM NABUCO ’ S FILE , 2006 .

[22]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[23]  Andrew K. C. Wong,et al.  A new method for gray-level picture thresholding using the entropy of the histogram , 1985, Comput. Vis. Graph. Image Process..

[24]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[25]  Carlos A. B. Mello,et al.  Image Thresholding of Historical Documents Using Entropy and ROC Curves , 2005, CIARP.

[26]  T. W. Ridler,et al.  Picture thresholding using an iterative selection method. , 1978 .

[27]  Prasanna K. Sahoo,et al.  Threshold selection using Renyi's entropy , 1997, Pattern Recognit..

[28]  Venu Govindaraju,et al.  Historical document image enhancement using background light intensity normalization , 2004, ICPR 2004.

[29]  G. Leedham,et al.  Decompose algorithm for thresholding degraded historical document images , 2005 .

[30]  Hong Yan,et al.  Unified formulation of a class of image thresholding techniques , 1996, Pattern Recognit..

[31]  Venu Govindaraju,et al.  Separating text and background in degraded document images - a comparison of global thresholding techniques for multi-stage thresholding , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[32]  Bülent Sankur,et al.  Survey over image thresholding techniques and quantitative performance evaluation , 2004, J. Electronic Imaging.

[33]  Hanqing Lu,et al.  An effective entropic thresholding for ultrasonic images , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[34]  S.W. Katz,et al.  Segmentation of chromosome images , 1993, 1993 IEEE South African Symposium on Communications and Signal Processing.

[35]  Rafael Dueire Lins,et al.  Binarizing and filtering historical documents with back-to-front interference , 2006, SAC '06.