Grayification: A meaningful grayscale conversion to improve handwritten historical documents analysis

Abstract This paper presents an improvement of handwriting binarization techniques on colored historical documents. We introduce a novel preprocessing step into the usual document image analysis (DIA) workflow. Before binarization, we propose a grayification step to enhance the input image with the help of a new grayscale conversion algorithm, namely the grayification algorithm. This new algorithm uses luminance and color information to improve the contrast between the foreground and the background. Especially on documents with non-black ink and moreover with diverse colors, e.g., illuminations in historical manuscripts, we expect an increased performance. The binarization give then better results on this enhanced grayscale image, and in particular color text is binarized as well as black text. In fact, by adding a preprocessing step to enhance the input grayscale image, the results on all the following tasks of the analysis chain should be improved. This modification of the usual workflow of historical document analysis eases the binarization task as well as other following tasks like layout analysis, line segmentation, OCR, etc. We demonstrate the effects of our novel preprocessing technique on a set of challenging historical documents, which we make publicly available for research purpose, and two publicly available datasets. This improvement is illustrated in this paper on the binarization task, where the results of four different binarization methods are successfully improved.

[1]  Reiner Eschbach,et al.  Spatial Color-to-Grayscale Transform Preserving Chrominance Edge Information , 2004, CIC.

[2]  Robin N. Strickland,et al.  Digital Color Image Enhancement Based On The Saturation Component , 1987 .

[3]  Nikos Papamarkos,et al.  Conversion of color documents to grayscale , 2013, 21st Mediterranean Conference on Control and Automation.

[4]  Rae-Hong Park,et al.  Document image binarization based on topographic analysis using a water flow model , 2002, Pattern Recognit..

[5]  Thomas M. Breuel,et al.  The OCRopus open source OCR system , 2008, Electronic Imaging.

[6]  Manuel Menezes de Oliveira Neto,et al.  An improved contrast enhancing approach for color-to-grayscale mappings , 2008, The Visual Computer.

[7]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[8]  Michael Blumenstein,et al.  Marginal Noise Reduction in Historical Handwritten Documents -- A Survey , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[9]  Anil K. Jain,et al.  Goal-Directed Evaluation of Binarization Methods , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Shijian Lu,et al.  Document image binarization using background estimation and stroke edges , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[11]  Hoai-Nam Le,et al.  Color to grayscale transform preserving natural order of hues , 2011, Proceedings of the 2011 International Conference on Electrical Engineering and Informatics.

[12]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[13]  Marcus Liwicki,et al.  SDK Reinvented: Document Image Analysis Methods as RESTful Web Services , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[14]  László Neumann,et al.  An Efficient Perception-based Adaptive Color to Gray Transformation , 2007, CAe.

[15]  Konstantinos Zagoris,et al.  ICFHR2016 Handwritten Document Image Binarization Contest (H-DIBCO 2016) , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[16]  Bruce Gooch,et al.  Color2Gray: salience-preserving color removal , 2005, SIGGRAPH 2005.

[17]  Robert Geist,et al.  Re‐coloring Images for Gamuts of Lower Dimension , 2005, Comput. Graph. Forum.

[18]  C. Saravanan,et al.  Color Image to Grayscale Image Conversion , 2010, 2010 Second International Conference on Computer Engineering and Applications.

[19]  Ioannis Pratikakis,et al.  Performance Evaluation Methodology for Historical Document Image Binarization , 2013, IEEE Transactions on Image Processing.

[20]  Martin Cadík,et al.  Perceptual Evaluation of Color‐to‐Grayscale Image Conversions , 2008, Comput. Graph. Forum.

[21]  Neil A. Dodgson,et al.  Decolorize: Fast, contrast enhancing, color to grayscale conversion , 2007, Pattern Recognit..

[22]  Ali Alsam,et al.  Contrast Enhancing Colour to Grey , 2009, SCIA.

[23]  Thomas M. Breuel,et al.  Efficient implementation of local adaptive thresholding techniques using integral images , 2008, Electronic Imaging.

[24]  N. Otsu A threshold selection method from gray level histograms , 1979 .