Degraded Historical Document Binarization: A Review on Issues, Challenges, Techniques, and Future Directions

In this era of digitization, most hardcopy documents are being transformed into digital formats. In the process of transformation, large quantities of documents are stored and preserved through electronic scanning. These documents are available from various sources such as ancient documentation, old legal records, medical reports, music scores, palm leaf, and reports on security-related issues. In particular, ancient and historical documents are hard to read due to their degradation in terms of low contrast and existence of corrupted artefacts. In recent times, degraded document binarization has been studied widely and several approaches were developed to deal with issues and challenges in document binarization. In this paper, a comprehensive review is conducted on the issues and challenges faced during the image binarization process, followed by insights on various methods used for image binarization. This paper also discusses the advanced methods used for the enhancement of degraded documents that improves the quality of documents during the binarization process. Further discussions are made on the effectiveness and robustness of existing methods, and there is still a scope to develop a hybrid approach that can deal with degraded document binarization more effectively.

[1]  Tien-Ying Kuo,et al.  A novel image binarization method using hybrid thresholding , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[2]  H. J. Tanke,et al.  Introduction to Fluorescence Microscopy , 1987 .

[3]  Haiping Lu,et al.  Distance-reciprocal distortion measure for binary document images , 2004, IEEE Signal Processing Letters.

[4]  Wencheng Wang,et al.  A background correction method for particle image under non-uniform illumination conditions , 2010, 2010 2nd International Conference on Signal Processing Systems.

[5]  Nikolaos Mitianoudis,et al.  Document image binarization using local features and Gaussian mixture modeling , 2015, Image Vis. Comput..

[6]  Chew Lim Tan,et al.  Restoration of Archival Documents Using a Wavelet Technique , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  G. Das,et al.  A novel hybrid approach to restore historical degraded documents , 2013, 2013 International Conference on Intelligent Systems and Signal Processing (ISSP).

[8]  Khairuddin Omar,et al.  An adaptive local binarization method for document images based on a novel thresholding method and dynamic windows , 2011, Pattern Recognit. Lett..

[9]  Shijian Lu,et al.  Combination of Document Image Binarization Techniques , 2011, 2011 International Conference on Document Analysis and Recognition.

[10]  Hong Yan,et al.  Character and line extraction from color map images using a multi-layer neural network , 1994, Pattern Recognit. Lett..

[11]  Wan Azani Mustafa,et al.  Illumination and Contrast Correction Strategy using Bilateral Filtering and Binarization Comparison , 2016 .

[12]  Mohamed Cheriet,et al.  Unsupervised Ensemble of Experts (EoE) Framework for Automatic Binarization of Document Images , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[13]  Khairuddin Omar,et al.  Adaptive binarization method for degraded document images based on surface contrast variation , 2015, Pattern Analysis and Applications.

[14]  Mohamed Cheriet,et al.  Ancient degraded document image binarization based on texture features , 2013, 2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA).

[15]  Xin Huang,et al.  Binarization of degraded document images based on contrast enhancement , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[16]  Xiaojun Wang,et al.  Recovery of blurring scanned manuscript image based on wavelets transform algorithm , 2010, 2010 3rd International Congress on Image and Signal Processing.

[17]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[18]  Roberto Pirrone,et al.  Restoration of out-of-focus images based on circle of confusion estimate , 2002, SPIE Optics + Photonics.

[19]  Chun-hung Li,et al.  Minimum cross entropy thresholding , 1993, Pattern Recognit..

[20]  Ioannis Pratikakis,et al.  A combined approach for the binarization of handwritten document images , 2014, Pattern Recognit. Lett..

[21]  Chris Tensmeyer,et al.  Document Image Binarization with Fully Convolutional Neural Networks , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[22]  Apostolos Antonacopoulos,et al.  Semantics-based content extraction in typewritten historical documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[23]  Dominique Michelucci,et al.  Degraded Historical Documents Images Binarization Using a Combination of Enhanced Techniques , 2019, ArXiv.

[24]  G. M. P. VAN KEMPEN,et al.  A quantitative comparison of image restoration methods for confocal microscopy , 1997 .

[25]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[26]  China Venkateswarlu Sonagiri Text Localization in Video Data Using Discrete , 2012 .

[27]  Alaa Sulaiman,et al.  A database for degraded Arabic historical manuscripts , 2017, 2017 6th International Conference on Electrical Engineering and Informatics (ICEEI).

[28]  John P. Oakley,et al.  Improving image quality in poor visibility conditions using a physical model for contrast degradation , 1998, IEEE Trans. Image Process..

[29]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[30]  Frank Lebourgeois,et al.  Serialized k-Means for Adaptative Color Image Segmentation: Application to Document Images and Others , 2004, Document Analysis Systems.

[31]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[32]  Abdelkrim Meziane,et al.  ISauvola: Improved Sauvola's Algorithm for Document Image Binarization , 2015, ICIAR.

[33]  Gaurav Sharma Cancellation of show-through in duplex scanning , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[34]  Shu Feng,et al.  A novel variational model for noise robust document image binarization , 2019, Neurocomputing.

[35]  H. D. Cheng,et al.  Threshold selection based on fuzzy c-partition entropy approach , 1998, Pattern Recognit..

[36]  Venu Govindaraju,et al.  Separating text and background in degraded document images - a comparison of global thresholding techniques for multi-stage thresholding , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[37]  Abdelkrim Meziane,et al.  A new efficient binarization method: application to degraded historical document images , 2017, Signal Image Video Process..

[38]  Zhang Huayu,et al.  Binarization of degraded document image based on contrast enhancement , 2016, 2016 35th Chinese Control Conference (CCC).

[39]  Shijian Lu,et al.  Document image binarization using background estimation and stroke edges , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[40]  Venu Govindaraju,et al.  Line separation for complex document images using fuzzy runlength , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[41]  B. Gatos,et al.  An Objective Evaluation Methodology for Handwritten Image Document Binarization Techniques , 2008 .

[42]  Mohamed Cheriet,et al.  A multi-scale framework for adaptive binarization of degraded document images , 2010, Pattern Recognit..

[43]  M. Moghaddam,et al.  A Mathematical Model to Estimate Out of Focus Blur , 2007, 2007 5th International Symposium on Image and Signal Processing and Analysis.

[44]  Håkan Grahn,et al.  Document Image Binarization Using Recurrent Neural Networks , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[45]  Efstathios Stamatatos,et al.  Adaptive Binarization of Historical Document Images , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[46]  Michael S. Brown,et al.  A framework for reducing ink-bleed in old documents , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  David S. Doermann,et al.  Document Image Quality Assessment: A Brief Survey , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[48]  Salvador España Boquera,et al.  Enhancement and Cleaning of Handwritten Data by Using Neural Networks , 2005, IbPRIA.

[49]  Mickaël Coustaty,et al.  ICFHR2016 Competition on the Analysis of Handwritten Text in Images of Balinese Palm Leaf Manuscripts , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[50]  Jianying Hu,et al.  Categorization using semi-supervised clustering , 2008, 2008 19th International Conference on Pattern Recognition.

[51]  Shanq-Jang Ruan,et al.  Adaptive thresholding algorithm: Efficient computation technique based on intelligent block detection for degraded document images , 2010, Pattern Recognit..

[52]  Jingjing Xu,et al.  Degraded historical document image binarization using local features and support vector machine (SVM) , 2018 .

[53]  Carlos A. B. Mello,et al.  A local thresholding algorithm for images of handwritten historical documents , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[54]  Hassiba Nemmour,et al.  New off-line Handwritten Signature Verification method based on Artificial Immune Recognition System , 2016, Expert Syst. Appl..

[55]  Ioannis Pratikakis,et al.  ICDAR 2009 Document Image Binarization Contest (DIBCO 2009) , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[56]  Apostolos Antonacopoulos,et al.  Flexible Text Recovery from Degraded Typewritten Historical Documents , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[57]  J. Biemond,et al.  Basic Methods for Image Restoration and Identification , 2009 .

[58]  Jorge Calvo-Zaragoza,et al.  A selectional auto-encoder approach for document image binarization , 2017, Pattern Recognit..

[59]  Wilfried Philips,et al.  Estimating image blur in the wavelet domain. , 2001 .

[60]  Yoav Y. Schechner,et al.  Clear underwater vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[61]  Josef Kittler,et al.  Minimum error thresholding , 1986, Pattern Recognit..

[62]  Ioannis Pratikakis,et al.  An Adaptive Binarization Technique for Low Quality Historical Documents , 2004, Document Analysis Systems.

[63]  Francesca Cesarini,et al.  A general system for the retrieval of document images from digital libraries , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[64]  Saad Bouguezel,et al.  Improved Degraded Document Image Binarization Using Median Filter for Background Estimation , 2018, Elektronika ir Elektrotechnika.

[65]  Matti Pietikäinen,et al.  Adaptive document binarization , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[66]  Nobuyuki Otsu,et al.  ATlreshold Selection Method fromGray-Level Histograms , 1979 .

[67]  Venu Govindaraju,et al.  Fast handwriting recognition for indexing historical documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[68]  Gabriel Thomas,et al.  Extracting a focused image from several out of focus micromechanical structure images , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[69]  Abderrahmane Kefali,et al.  Foreground-Background Separation by Feed-forward Neural Networks in Old Manuscripts , 2014, Informatica.

[70]  Liansheng Wang,et al.  Broken and degraded document images binarization , 2017, Neurocomputing.

[71]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[72]  Konstantinos Zagoris,et al.  ICDAR2017 Competition on Document Image Binarization (DIBCO 2017) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[73]  Rafael C. González,et al.  Local Determination of a Moving Contrast Edge , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  Rajbir Kaur,et al.  An Effective Algorithm for Ink-Bleed through Removal in Document Images , 2011 .

[75]  Salvador España Boquera,et al.  Insights on the Use of Convolutional Neural Networks for Document Image Binarization , 2015, IWANN.

[76]  Eric Dubois,et al.  Reduction of Bleed-through in Scanned Manuscript Documents , 2001, PICS.

[77]  Sitti Rachmawati Yahya,et al.  Review on image enhancement methods of old manuscript with the damaged background , 2009, 2009 International Conference on Electrical Engineering and Informatics.

[78]  Wan Azani Mustafa,et al.  Image Enhancement Technique on Contrast Variation: A Comprehensive Review , 2017 .

[79]  Peiyi Shen,et al.  Extraction from Historical Handwritten Documents by Edge Detection , 2004 .

[80]  Masaki Nakagawa,et al.  Similarity Evaluation and Shape Feature Extraction for Character Pattern Retrieval to Support Reading Historical Documents , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[81]  Mansour Jamzad,et al.  Linear Motion Blur Parameter Estimation in Noisy Images Using Fuzzy Sets and Power Spectrum , 2007, EURASIP J. Adv. Signal Process..

[82]  Sudipta Roy,et al.  A New Local Adaptive Thresholding Technique in Binarization , 2012, ArXiv.

[83]  Manpreet Kaur,et al.  Survey of Contrast Enhancement Techniques based on Histogram Equalization , 2011 .

[84]  Weisi Lin,et al.  An Objective Out-of-Focus Blur Measurement , 2005, 2005 5th International Conference on Information Communications & Signal Processing.

[85]  Li Xu,et al.  Structure extraction from texture via relative total variation , 2012, ACM Trans. Graph..

[86]  Jirí Jan,et al.  Retrospective Illumination Correction of Retinal Images , 2010, Int. J. Biomed. Imaging.

[87]  Roberto Paredes,et al.  A Hybrid Binarization Technique for Document Images , 2011, Learning Structure and Schemas from Documents.

[88]  Nicole Vincent,et al.  Comparison of Niblack inspired binarization methods for ancient documents , 2009, Electronic Imaging.

[89]  Abdel Belaïd,et al.  Self-organizing Maps and Ancient Documents , 2004, Document Analysis Systems.

[90]  Hyung Jeong Yang,et al.  Binarization of degraded document images based on hierarchical deep supervised network , 2018, Pattern Recognit..

[91]  Björn Stenger,et al.  Parsing floor plan images , 2017, 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA).

[92]  Sakinah Ali Pitchay,et al.  COMPOUND BINARIZATION FOR DEGRADED DOCUMENT IMAGES , 2015 .

[93]  Henry S. Baird Difficult and urgent open problems in document image analysis for libraries , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[94]  Alex Graves,et al.  Grid Long Short-Term Memory , 2015, ICLR.

[95]  K. Shirai,et al.  Character Shape Restoration of Binarized Historical Documents by Smoothing via Geodesic Morphology , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[96]  Mohamed Cheriet,et al.  AdOtsu: An adaptive and parameterless generalization of Otsu's method for document image binarization , 2012, Pattern Recognit..

[97]  Olarik Surinta,et al.  Image Segmentation of Historical Handwriting from Palm Leaf Manuscripts , 2008, Intelligent Information Processing.

[98]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[99]  K. W. Wong,et al.  A two-stage binarization approach for document images , 2001, Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No.01EX489).

[100]  Apostolos Antonacopoulos,et al.  Document image analysis for World War II personal records , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[101]  J.M. Ferryman,et al.  PETS Metrics: On-Line Performance Evaluation Service , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[102]  Abdelkrim Meziane,et al.  An active contour based method for image binarization: Application to degraded historical document images , 2014, 2014 4th International Symposium ISKO-Maghreb: Concepts and Tools for knowledge Management (ISKO-Maghreb).

[103]  Konstantinos Zagoris,et al.  ICFHR2016 Handwritten Document Image Binarization Contest (H-DIBCO 2016) , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[104]  Nicholas R. Howe,et al.  Document binarization with automatic parameter tuning , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[105]  Ichiro Fujinaga,et al.  Pixel-wise binarization of musical documents with convolutional neural networks , 2017, 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA).