A statistical tool based binarization method for document images

Binarization of document images has great importance in several applications like historical document restoration, Optical Character Recognition (OCR). It is a challenging task due to small difference between foreground and background pixel intensities, intricate font patterns and noisy background. In this article a binarization algorithm is presented for document images which has performed significantly well on handwritten document images as well as machine printed document images. First, the RGB document images are converted to a prominent gray-scale image using statistical tools like mean, variance and standard deviation. Next, the gray-scale images are binarized using edge detection. Further the noises are removed using connected component features analysis. The proposed method is experimented on publicly available DIBCO 2016 and DIBCO 2017 datasets. The performance of the proposed algorithm is satisfactory in terms of F-Measure (FM), Pseudo-FMeasure (Fps), PSNR, Distance Reciprocal Distortion (DRD) and it also provides significant results on degraded document images.

[1]  Rosita Wachenchauzer,et al.  A New and Efficient Algorithm to Binarize Document Images Removing Back-to-Front Interference , 2008, J. Univers. Comput. Sci..

[2]  Ioannis Pratikakis,et al.  ICDAR 2011 Document Image Binarization Contest (DIBCO 2011) , 2011, 2011 International Conference on Document Analysis and Recognition.

[3]  Abderrahmane Kefali,et al.  Text Extraction from Historical Document Images by the Combination of Several Thresholding Techniques , 2014, Adv. Multim..

[4]  Anders Hast,et al.  Learning Surrogate Models of Document Image Quality Metrics for Automated Document Image Processing , 2017, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[5]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[6]  Basilios Gatos,et al.  ICDAR2017 Competition on Historical Document Writer Identification (Historical-WI) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[7]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[8]  Mohamed Cheriet,et al.  Historical Document Binarization Based on Phase Information of Images , 2012, ACCV Workshops.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ioannis Pratikakis,et al.  ICDAR 2009 Document Image Binarization Contest (DIBCO 2009) , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[11]  William A. Barrett,et al.  PageNet: Page Boundary Extraction in Historical Handwritten Documents , 2017, HIP@ICDAR.

[12]  William A. Barrett,et al.  Start, Follow, Read: End-to-End Full-Page Handwriting Recognition , 2018, ECCV.

[13]  Konstantinos Zagoris,et al.  ICDAR2017 Competition on Document Image Binarization (DIBCO 2017) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[14]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[15]  Nobuyuki Otsu,et al.  ATlreshold Selection Method fromGray-Level Histograms , 1979 .

[16]  Lazaros T. Tsochatzidis,et al.  ICDAR 2019 Competition on Document Image Binarization (DIBCO 2019) , 2017, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[17]  Efstathios Stamatatos,et al.  Improving the quality of degraded document images , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[18]  Edward Roe,et al.  Restoring images of ancient color postcards , 2014, The Visual Computer.

[19]  Raúl Rojas,et al.  Local Contrast Segmentation to Binarize Images , 2009, 2009 Third International Conference on Digital Society.

[20]  Konstantinos Zagoris,et al.  ICFHR2016 Handwritten Document Image Binarization Contest (H-DIBCO 2016) , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[21]  Nicholas R. Howe,et al.  Document binarization with automatic parameter tuning , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[22]  Ioannis Pratikakis,et al.  A combined approach for the binarization of handwritten document images , 2014, Pattern Recognit. Lett..

[23]  Sanchez Joan Andreu,et al.  ICFHR2016 Competition on Handwritten Text Recognition on the READ Dataset , 2016 .

[24]  Bidyut Baran Chaudhuri,et al.  An approach for detecting and cleaning of struck-out handwritten text , 2017, Pattern Recognit..

[25]  Ioannis Pratikakis,et al.  H-DIBCO 2010 - Handwritten Document Image Binarization Competition , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[26]  Rupinder Kaur,et al.  Review of Robust Document Image BINARIZATION Technique for Degraded Document Images , 2015 .

[27]  Nikos Papamarkos,et al.  An Adaptive Layer-Based Local Binarization Technique for Degraded Documents , 2010, Int. J. Pattern Recognit. Artif. Intell..

[28]  Ioannis Pratikakis,et al.  ICFHR 2012 Competition on Handwritten Document Image Binarization (H-DIBCO 2012) , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[29]  Wael Abd-Almageed,et al.  Learning document image binarization from data , 2015, 2016 IEEE International Conference on Image Processing (ICIP).

[30]  Ioannis Pratikakis,et al.  ICFHR2014 Competition on Handwritten Document Image Binarization (H-DIBCO 2014) , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[31]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Ioannis Pratikakis,et al.  ICDAR 2013 Document Image Binarization Contest (DIBCO 2013) , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[33]  Imran Siddiqi,et al.  Isolated Handwritten Digit Recognition Using oBIFs and Background Features , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[34]  Bart Lamiroy,et al.  2011 International Conference on Document Analysis and Recognition, ICDAR 2011, Beijing, China, September 18-21, 2011 , 2011, ICDAR.

[35]  Lewis D. Griffin,et al.  Basic Image Features (BIFs) Arising from Approximate Symmetry Type , 2009, SSVM.

[36]  Apostolos Antonacopoulos,et al.  The IMPACT dataset of historical document images , 2013, HIP '13.