Improvement of Image Binarization Methods Using Image Preprocessing with Local Entropy Filtering for Alphanumerical Character Recognition Purposes

Automatic text recognition from the natural images acquired in uncontrolled lighting conditions is a challenging task due to the presence of shadows hindering the shape analysis and classification of individual characters. Since the optical character recognition methods require prior image binarization, the application of classical global thresholding methods in such case makes it impossible to preserve the visibility of all characters. Nevertheless, the use of adaptive binarization does not always lead to satisfactory results for heavily unevenly illuminated document images. In this paper, the image preprocessing methodology with the use of local image entropy filtering is proposed, allowing for the improvement of various commonly used image thresholding methods, which can be useful also for text recognition purposes. The proposed approach was verified using a dataset of 140 differently illuminated document images subjected to further text recognition. Experimental results, expressed as Levenshtein distances and F-Measure values for obtained text strings, are promising and confirm the usefulness of the proposed approach.

[1]  Jaeyeon Lee,et al.  Best Combination of Binarization Methods for License Plate Character Segmentation , 2013 .

[2]  Devesh Kumar Srivastava,et al.  A Review on Pixel-Based Binarization of Gray Images , 2016 .

[3]  Berna Erol,et al.  HOTPAPER: multimedia interaction with paper using mobile phones , 2008, ACM Multimedia.

[4]  Thierry Pun,et al.  A new method for grey-level picture thresholding using the entropy of the histogram , 1980 .

[5]  Wan Azani Mustafa,et al.  Binarization of Document Images: A Comprehensive Review , 2018, Journal of Physics: Conference Series.

[6]  Chien-Hsing Chou,et al.  A binarization method with learning-built rules for document images produced by cameras , 2010, Pattern Recognit..

[7]  Krzysztof Okarma,et al.  Optimization of the Fast Image Binarization Method Based on the Monte Carlo Approach , 2014 .

[8]  Andrew K. C. Wong,et al.  A new method for gray-level picture thresholding using the entropy of the histogram , 1985, Comput. Vis. Graph. Image Process..

[9]  Haiping Lu,et al.  Distance-reciprocal distortion measure for binary document images , 2004, IEEE Signal Processing Letters.

[10]  Frédéric Bouchara,et al.  Super-Resolved Binarization of Text Based on the FAIR Algorithm , 2011, 2011 International Conference on Document Analysis and Recognition.

[11]  X. Tian,et al.  A Tsallis-Entropy Image Thresholding Method Based on Two-Dimensional Histogram Obique Segmentation , 2009, 2009 WASE International Conference on Information Engineering.

[12]  Ahmed S. Abutableb Automatic thresholding of gray-level pictures using two-dimensional entropy , 1989 .

[13]  Saad Bouguezel,et al.  Improved Degraded Document Image Binarization Using Median Filter for Background Estimation , 2018, Elektronika ir Elektrotechnika.

[14]  A. D. Brink,et al.  Minimum cross-entropy threshold selection , 1996, Pattern Recognit..

[15]  Konstantinos Zagoris,et al.  ICDAR2017 Competition on Document Image Binarization (DIBCO 2017) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[16]  Krzysztof Okarma,et al.  Fast Adaptive Image Binarization Using the Region Based Approach , 2018, CSOS.

[17]  Nikolaos Mitianoudis,et al.  Document image binarization using local features and Gaussian mixture modeling , 2015, Image Vis. Comput..

[18]  Khairuddin Omar,et al.  Degraded Historical Document Binarization: A Review on Issues, Challenges, Techniques, and Future Directions , 2019, J. Imaging.

[19]  B. Kapralos,et al.  I An Introduction to Digital Image Processing , 2022 .

[20]  Nicholas R. Howe,et al.  A Laplacian Energy for Document Binarization , 2011, 2011 International Conference on Document Analysis and Recognition.

[21]  Rong Wang,et al.  Image sequence segmentation based on 2D temporal entropic thresholding , 1996, Pattern Recognit. Lett..

[22]  Chris Tensmeyer,et al.  Document Image Binarization with Fully Convolutional Neural Networks , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[23]  Bülent Sankur,et al.  Survey over image thresholding techniques and quantitative performance evaluation , 2004, J. Electronic Imaging.

[24]  Mohamed Cheriet,et al.  AdOtsu: An adaptive and parameterless generalization of Otsu's method for document image binarization , 2012, Pattern Recognit..

[25]  Jingjing Xu,et al.  Degraded historical document image binarization using local features and support vector machine (SVM) , 2018 .

[26]  A. V. Samorodov,et al.  Fast implementation of the Niblack binarization algorithm for microscope image segmentation , 2016, Pattern Recognition and Image Analysis.

[27]  Michael S. Brown,et al.  BinarizationShop: a user-assisted software suite for converting old documents to black-and-white , 2010, JCDL '10.

[28]  J.M. Ferryman,et al.  PETS Metrics: On-Line Performance Evaluation Service , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[29]  Derek Bradley,et al.  Adaptive Thresholding using the Integral Image , 2007, J. Graph. Tools.

[30]  Shijian Lu,et al.  Document image binarization using background estimation and stroke edges , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[31]  Manish Kumar Gupta,et al.  Complex and degraded color document image binarization , 2016, 2016 3rd International Conference on Signal Processing and Integrated Networks (SPIN).

[32]  Reza Malekian,et al.  An improved quantitative recurrence analysis using artificial intelligence based image processing applied to sensor measurements , 2019, Concurr. Comput. Pract. Exp..

[33]  Thierry Géraud,et al.  Efficient multiscale Sauvola’s binarization , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[34]  Khairuddin Omar,et al.  An adaptive local binarization method for document images based on a novel thresholding method and dynamic windows , 2011, Pattern Recognit. Lett..

[35]  Krzysztof Okarma,et al.  Region based adaptive binarization for optical character recognition purposes , 2018, 2018 International Interdisciplinary PhD Workshop (IIPhDW).

[36]  Nicole Vincent,et al.  Comparison of Niblack inspired binarization methods for ancient documents , 2009, Electronic Imaging.

[37]  Hyung Jeong Yang,et al.  Binarization of degraded document images based on hierarchical deep supervised network , 2018, Pattern Recognit..

[38]  Rafael Dueire Lins,et al.  Assessing Binarization Techniques for Document Images , 2017, DocEng.

[39]  Shijian Lu,et al.  Robust Document Image Binarization Technique for Degraded Document Images , 2013, IEEE Transactions on Image Processing.

[40]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[41]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[42]  Paul L. Rosin Unimodal thresholding , 2001, Pattern Recognit..

[43]  Shu Feng,et al.  A novel variational model for noise robust document image binarization , 2019, Neurocomputing.

[44]  Krzysztof Okarma,et al.  Binarization of document images using the modified local-global Otsu and Kapur algorithms , 2015 .

[45]  Jun Wang,et al.  A multilevel color image thresholding scheme based on minimum cross entropy and alternating direction method of multipliers , 2019, Optik.

[46]  Shang Gao,et al.  An improved scheme for minimum cross entropy threshold selection based on genetic algorithm , 2011, Knowl. Based Syst..

[47]  Ching Y. Suen,et al.  Ternary Entropy-Based Binarization of Degraded Document Images Using Morphological Operators , 2011, 2011 International Conference on Document Analysis and Recognition.

[48]  Jean-Michel Jolion,et al.  Extraction and recognition of artificial text in multimedia documents , 2003, Formal Pattern Analysis & Applications.

[49]  Liansheng Wang,et al.  Broken and degraded document images binarization , 2017, Neurocomputing.

[50]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[51]  Ioannis Pratikakis,et al.  Performance Evaluation Methodology for Historical Document Image Binarization , 2013, IEEE Transactions on Image Processing.

[52]  Nikos Papamarkos,et al.  An Evaluation Technique for Binarization Algorithms , 2008, J. Univers. Comput. Sci..

[53]  Konstantinos Zagoris,et al.  ICFHR 2018 Competition on Handwritten Document Image Binarization (H-DIBCO 2018) , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[54]  Yap-Peng Tan,et al.  Adaptive binarization method for document image analysis , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[55]  Lalit Prakash Saxena Niblack’s binarization method and its modifications to real-time applications: a review , 2017, Artificial Intelligence Review.

[56]  Yan Chen,et al.  Comparison of some thresholding algorithms for text/background segmentation in difficult document images , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[57]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[58]  Ahmed S. Abutaleb,et al.  Automatic thresholding of gray-level pictures using two-dimensional entropy , 1989, Comput. Vis. Graph. Image Process..

[59]  Partha Bhowmick,et al.  Adaptive-interpolative binarization with stroke preservation for restoration of faint characters in degraded documents , 2015, J. Vis. Commun. Image Represent..

[60]  Krzysztof Okarma,et al.  Fast Histogram Based Image Binarization Using the Monte Carlo Threshold Estimation , 2014, ICCVG.

[61]  Jiangtao Wen,et al.  A new binarization method for non-uniform illuminated document images , 2013, Pattern Recognit..

[62]  Thierry Pun,et al.  Entropic thresholding, a new approach , 1981 .

[63]  Krzysztof Okarma,et al.  Prediction of the Optical Character Recognition Accuracy based on the Combined Assessment of Image Binarization Results , 2015 .

[64]  Akihiro Okamoto,et al.  A Binarization Method for Degraded Document Images with Morphological Operations , 2013, MVA.