Degraded document image binarization using structural symmetry of strokes

Abstract This paper presents an effective approach for the local threshold binarization of degraded document images. We utilize the structural symmetric pixels (SSPs) to calculate the local threshold in neighborhood and the voting result of multiple thresholds will determine whether one pixel belongs to the foreground or not. The SSPs are defined as the pixels around strokes whose gradient magnitudes are large enough and orientations are symmetric opposite. The compensated gradient map is used to extract the SSP so as to weaken the influence of document degradations. To extract SSP candidates with large magnitudes and distinguish the faint characters and bleed-through background, we propose an adaptive global threshold selection algorithm. To further extract pixels with opposite orientations, an iterative stroke width estimation algorithm is applied to ensure the proper size of neighborhood used in orientation judgement. At last, we present a multiple threshold vote based framework to deal with some inaccurate detections of SSP. The experimental results on seven public document image binarization datasets show that our method is accurate and robust compared with many traditional and state-of-the-art document binarization approaches based on multiple evaluation measures.

[1]  Frédéric Bouchara,et al.  Document Image Binarisation Using Markov Field Model , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[2]  Josef Kittler,et al.  Minimum error thresholding , 1986, Pattern Recognit..

[3]  Liansheng Wang,et al.  Broken and degraded document images binarization , 2017, Neurocomputing.

[4]  B. Kapralos,et al.  I An Introduction to Digital Image Processing , 2022 .

[5]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[6]  Shijian Lu,et al.  Binarization of historical document images using the local maximum and minimum , 2010, DAS '10.

[7]  Nicholas R. Howe,et al.  A Laplacian Energy for Document Binarization , 2011, 2011 International Conference on Document Analysis and Recognition.

[8]  Ioannis Pratikakis,et al.  A combined approach for the binarization of handwritten document images , 2014, Pattern Recognit. Lett..

[9]  Seema Pardhi,et al.  An Improved Binarization Method for Degraded Document , 2017 .

[10]  Chien-Hsing Chou,et al.  A binarization method with learning-built rules for document images produced by cameras , 2010, Pattern Recognit..

[11]  Ioannis Pratikakis,et al.  ICDAR 2013 Document Image Binarization Contest (DIBCO 2013) , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[12]  Ioannis Pratikakis,et al.  ICDAR 2009 Document Image Binarization Contest (DIBCO 2009) , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[13]  Shijian Lu,et al.  A learning framework for degraded document image binarization using Markov Random Field , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[14]  Abdelkrim Meziane,et al.  A new efficient binarization method: application to degraded historical document images , 2017, Signal Image Video Process..

[15]  Nikolaos Mitianoudis,et al.  Document image binarization using local features and Gaussian mixture modeling , 2015, Image Vis. Comput..

[16]  Shijian Lu,et al.  Document image binarization using background estimation and stroke edges , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[17]  Ioannis Pratikakis,et al.  ICDAR 2011 Document Image Binarization Contest (DIBCO 2011) , 2011, 2011 International Conference on Document Analysis and Recognition.

[18]  Frédéric Bouchara,et al.  FAIR: A Fast Algorithm for Document Image Restoration , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ioannis Pratikakis,et al.  H-DIBCO 2010 - Handwritten Document Image Binarization Competition , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[20]  Konstantinos Zagoris,et al.  ICFHR2016 Handwritten Document Image Binarization Contest (H-DIBCO 2016) , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[21]  Nicholas R. Howe,et al.  Document binarization with automatic parameter tuning , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[22]  Wael Abd-Almageed,et al.  Learning document image binarization from data , 2015, 2016 IEEE International Conference on Image Processing (ICIP).

[23]  Chunheng Wang,et al.  Adaptive Graph Cut Based Binarization of Video Text Images , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[24]  C. V. Jawahar,et al.  Unsupervised refinement of color and stroke features for text binarization , 2017, International Journal on Document Analysis and Recognition (IJDAR).

[25]  Ioannis Pratikakis,et al.  Improved document image binarization by using a combination of multiple binarization techniques and adapted edge information , 2008, 2008 19th International Conference on Pattern Recognition.

[26]  Chew Lim Tan,et al.  Binarization of Badly Illuminated Document Images through Shading Estimation and Compensation , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[27]  Ioannis Pratikakis,et al.  ICFHR 2012 Competition on Handwritten Document Image Binarization (H-DIBCO 2012) , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[28]  Shijian Lu,et al.  Robust Document Image Binarization Technique for Degraded Document Images , 2013, IEEE Transactions on Image Processing.

[29]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[30]  Shijian Lu,et al.  Combination of Document Image Binarization Techniques , 2011, 2011 International Conference on Document Analysis and Recognition.

[31]  Chunheng Wang,et al.  Document Image Binarization Using Structural Symmetry of Strokes , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[32]  Xin Chen,et al.  Parallel nonparametric binarization for degraded document images , 2016, Neurocomputing.