Extracting text from degraded document image

The recent era of digitization is expected to digitized many old important documents which are degraded due to various reasons. Degraded document image binarization has many challenges like intensity variation, background contrast variation, bleed through, text size variation and so on. Many approaches are available for document image binarization, but none can handle all types of degradation at once. We proposed an approach which consists of three stages such as preprocessing, Text-Area detection and post-processing. Preprocessing enhances the contrast of the image. Next stage involves identifying Text-Area. Postprocessing technique takes care of false positives and false negative based on intensity values of preprocessed and gray image. The Performance is evaluated based on various quantitative measures and is compared with the method regarded best so far. The algorithm is also expected to be independent of the script, hence is tested on Gujarati degraded document images.

[1]  Xiaoyang Tan,et al.  Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions , 2007, IEEE Transactions on Image Processing.

[2]  Ioannis Pratikakis,et al.  Performance Evaluation Methodology for Historical Document Image Binarization , 2013, IEEE Transactions on Image Processing.

[3]  Suman K. Mitra,et al.  A new denoising filter for brain MR images , 2012, ICVGIP '12.

[4]  Ophir Frieder,et al.  Robust binarization of degraded document images using heuristics , 2013, Electronic Imaging.

[5]  Ioannis Pratikakis,et al.  ICDAR 2011 Document Image Binarization Contest (DIBCO 2011) , 2011, 2011 International Conference on Document Analysis and Recognition.

[6]  Ioannis Pratikakis,et al.  ICDAR 2009 Document Image Binarization Contest (DIBCO 2009) , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[7]  Ioannis Pratikakis,et al.  ICFHR 2012 Competition on Handwritten Document Image Binarization (H-DIBCO 2012) , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[8]  Ioannis Pratikakis,et al.  H-DIBCO 2010 - Handwritten Document Image Binarization Competition , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[9]  Ioannis Pratikakis,et al.  ICDAR 2013 Document Image Binarization Contest (DIBCO 2013) , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[10]  Shijian Lu,et al.  Robust Document Image Binarization Technique for Degraded Document Images , 2013, IEEE Transactions on Image Processing.

[11]  Ioannis Pratikakis,et al.  ICFHR2014 Competition on Handwritten Document Image Binarization (H-DIBCO 2014) , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.