Removal of Historical Document Degradations using Conditional GANs

One of the most crucial problem in document analysis and OCR pipeline is document binarization. Many traditional algorithms over the past few decades like Sauvola, Niblack, Otsu etc,. were used for binarization which gave insufficient results for historical texts with degradations. Recently many attempts have been made to solve binarization using deep learning approaches like Autoencoders, FCNs. However, these models do not generalize well to real world historical document images qualitatively. In this paper, we propose a model based on conditional GAN, well known for its high-resolution image synthesis. Here, the proposed model is used for image manipulation task which can remove different degradations in historical documents like stains, bleed-through and non-uniform shadings. The performance of the proposed model outperforms recent state-of-the-art models for document image binarization. We support our claims by benchmarking the proposed model on publicly available PHIBC 2012, DIBCO (2009-2017) and Palm Leaf datasets. The main objective of this paper is to illuminate the advantages of generative modeling and adversarial training for document image binarization in supervised setting which shows good generalization capabilities on different inter/intra class domain document images.

[1]  Mickaël Coustaty,et al.  ICFHR2016 Competition on the Analysis of Handwritten Text in Images of Balinese Palm Leaf Manuscripts , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[2]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[3]  Thomas M. Breuel,et al.  High-Performance OCR for Printed English and Fraktur Using LSTM Networks , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[4]  Xujun Peng,et al.  Using Convolutional Encoder-Decoder for Document Image Binarization , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[5]  Jamshid Shanbehzadeh,et al.  Noise removal and binarization of scanned document images using clustering of features , 2017 .

[6]  Salvador España Boquera,et al.  F-Measure as the Error Function to Train Neural Networks , 2013, IWANN.

[7]  Jorge Calvo-Zaragoza,et al.  A selectional auto-encoder approach for document image binarization , 2017, Pattern Recognit..

[8]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[9]  Bidyut Baran Chaudhuri,et al.  A Global-to-Local Approach to Binarization of Degraded Document Images , 2014, 2014 22nd International Conference on Pattern Recognition.

[10]  Mohamed Cheriet,et al.  Influence of Color-to-Gray Conversion on the Performance of Document Image Binarization: Toward a Novel Optimization Problem , 2015, IEEE Transactions on Image Processing.

[11]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[12]  Syed Saqib Bukhari,et al.  Robust Binarization of Stereo and Monocular Document Images Using Percentile Filter , 2013, CBDAR.

[13]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[15]  Seema Pardhi,et al.  An Improved Binarization Method for Degraded Document , 2017 .

[16]  Anders Brun,et al.  PDNet: Semantic Segmentation integrated with a Primal-Dual Network for Document binarization , 2018, Pattern Recognit. Lett..

[17]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Håkan Grahn,et al.  Document Image Binarization Using Recurrent Neural Networks , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[19]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[20]  Nikolaos Mitianoudis,et al.  Document image binarization using local features and Gaussian mixture modeling , 2015, Image Vis. Comput..

[21]  Konstantinos Zagoris,et al.  ICDAR2017 Competition on Document Image Binarization (DIBCO 2017) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[22]  Youcef Chibani,et al.  Enhancement of Historical Document Images by Combining Global and Local Binarization Technique , 2014 .

[23]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[24]  Stephen V. Rice,et al.  The Fourth Annual Test of OCR Accuracy , 1995 .

[25]  Andreas Dengel,et al.  anyOCR: A sequence learning based OCR system for unlabeled historical documents , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[26]  Andreas Dengel,et al.  anyOCR: An Open-Source OCR System for Historical Archives , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[27]  Chris Tensmeyer,et al.  Document Image Binarization with Fully Convolutional Neural Networks , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[28]  Ioannis Pratikakis,et al.  ICDAR 2009 Document Image Binarization Contest (DIBCO 2009) , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[29]  Hossein Ziaei Nafchi,et al.  Persian heritage image binarization competition (PHIBC 2012) , 2013, 2013 First Iranian Conference on Pattern Recognition and Image Analysis (PRIA).

[30]  Ioannis Pratikakis,et al.  ICFHR2014 Competition on Handwritten Document Image Binarization (H-DIBCO 2014) , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[31]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[32]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[33]  Thomas M. Breuel,et al.  The OCRopus open source OCR system , 2008, Electronic Imaging.