Normalized text font resemblance method aimed at document image page clustering

This paper describes an approach towards obtaining the normalized measure of text resemblance in scanned images. The technique, aimed at automatic content conversion, is relying on the detection of standard character features and uses a sequence of procedures and algorithms applied sequentially on the input document. The approach makes use solely of the geometrical characteristics of characters, ignoring information regarding context or the character-recognition.

[1]  George Nagy,et al.  Optical character recognition: an illustrated guide to the frontier , 1999, Electronic Imaging.

[2]  Luigi Cinque,et al.  Run-Based Algorithms for Binary Image Analysis and Processing , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Bin Chen,et al.  Fuzzy Template Matching for Printing Character Inspection , 2003 .

[4]  Steve Mann Intelligent Image Processing , 2001 .

[5]  Stephen V. Rice,et al.  The Fourth Annual Test of OCR Accuracy , 1995 .

[6]  Henry S. Baird,et al.  Digital libraries and document image analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[7]  Xiaohu Zhang,et al.  Training on severely degraded text-line images , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[8]  G. N. Srinivasan,et al.  An overview of segmentation techniques for target detection in visual images , 2008, ICIA 2008.

[9]  Liaquat Majeed Sheikh,et al.  An adaptive multi-thresholding technique for binarization of color images , 2005 .

[10]  Maher I. Rajab,et al.  Feature extraction of epiluminescence microscopic images by iterative segmentation algorithm , 2005 .

[11]  Costin-Anton Boiangiu,et al.  Bitonal image creation for automatic content conversion , 2008, ICIA 2008.

[12]  William K. Pratt,et al.  Digital Image Processing: PIKS Inside , 2001 .

[13]  Costin-Anton Boiangiu,et al.  Automatic text clustering and classification based on font geometrical characteristics , 2008, ICIA 2008.