论文信息 - Modeling Adaptive Degraded Document Image Binarization and Optical Character System

Modeling Adaptive Degraded Document Image Binarization and Optical Character System

This paper presents an enhanced system for degraded old document. The developed system is able to deal with degradations which occur due to shadows, non-uniform illumination, low contrast and noise. The developed system is able to separate the two regions of the document. Different filtering techniques are used in the de-noising step for the purpose of de-noising and a rough estimation of foreground region and background region. Binarization step is applied by computing an approximate background surface of an original image. Final threshold step is performed by combining the calculated background surface with the preprocessed original image, using a threshold parameter for predefined local window of specific size. Different interpolation techniques are used in the final step to achieve better quality binary image which yield to elimination noises, improve the quality of the text regions and preserve stroke connectivity by filling possible breaks, gaps or holes. The second part of this research deals with optical character recognition,OCR. The result obtained after preprocessing step (typewritten or printed text, usually captured by scanner) is converted into machine-editable text. In this phase, we initially trained the system (on the known samples of each character) in order to to read a specific font "Intelligent system", then we performed the testing step (converting the image into editable text). The adaptive image will pass through several steps: image analyses for characters, detecting individual symbols, line and character boundary detection, resize character, feature extraction, output computation and finally displaying character representation of the Unicode output (on microsoft office word application). The proposed system is implemented and tested on actual degraded images. The proposed technique offers good output quality and quite fast.

Yahia S. Halabi | Princes Sumaya | Faris Hamdan | Khaled Haj Yousef

[1] M. V. Wilkes,et al. The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[2] Cullen Jennings,et al. Thresholding using an illumination model , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[3] Matti Pietikäinen,et al. Adaptive document image binarization , 2000, Pattern Recognit..

[4] Joseph N. Wilson,et al. Handbook of computer vision algorithms in image algebra , 1996 .

[5] Ioannis Pratikakis,et al. Adaptive degraded document image binarization , 2006, Pattern Recognit..

[6] Yung-Sheng Chen,et al. Adaptive thresholding algorithm and its hardware implementation , 1994, Pattern Recognit. Lett..

[7] Rae-Hong Park,et al. Document image binarization based on topographic analysis using a water flow model , 2002, Pattern Recognit..

[8] Josef Kittler,et al. Threshold selection based on a simple image statistic , 1985, Comput. Vis. Graph. Image Process..

[9] Hong Yan,et al. Unified formulation of a class of image thresholding techniques , 1996, Pattern Recognit..

[10] Nikolaos Stamatopoulos,et al. An Efficient Feature Extraction and Dimensionality Reduction Scheme for Isolated Greek Handwritten Character Recognition , 2007 .

[11] Anil K. Jain,et al. Goal-Directed Evaluation of Binarization Methods , 1995, IEEE Trans. Pattern Anal. Mach. Intell..