An Arabic optical character recognition system using recognition-based segmentation

Abstract Optical character recognition (OCR) systems improve human–machine interaction and are widely used in many areas. The recognition of cursive scripts is a difficult task as their segmentation suffers from serious problems. This paper proposes an Arabic OCR system, which uses a recognition-based segmentation technique to overcome the classical segmentation problems. A newly developed Arabic word segmentation algorithm is also introduced to separate horizontally overlapping Arabic words/subwords. There is also a feedback loop to control the combination of character fragments for recognition. The system was implemented and the results show a 90% recognition accuracy with a 20 chars/s recognition rate.

[1]  V. K. Govindan,et al.  Character recognition - A review , 1990, Pattern Recognit..

[2]  Robert M. Haralick,et al.  Segmentation-free word recognition with application to Arabic , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[3]  Wen-Hsiang Tsai,et al.  Moment-preserving thresholding: a new approach , 1995 .

[4]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[5]  Adnan Amin,et al.  Off-line Arabic character recognition: the state of the art , 1998, Pattern Recognit..

[6]  Neil W. Bergmann,et al.  Implementation of a statistical based Arabic character recognition system , 1997, TENCON '97 Brisbane - Australia. Proceedings of IEEE TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162).

[7]  S. S. Upda,et al.  Recognition of Arabic Characters , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Boualem Boashash,et al.  A structural-description-based vision system for automatic object recognition , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[9]  Neil W. Bergmann,et al.  A recognition-based Arabic optical character recognition system , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[10]  Adnan Amin,et al.  Machine recognition and correction of printed Arabic text , 1989, IEEE Trans. Syst. Man Cybern..

[11]  Norihiro Hagita,et al.  Automated entry system for printed documents , 1990, Pattern Recognit..

[12]  Wen-Hsiang Tsai,et al.  Moment-preserving thresolding: A new approach , 1985, Comput. Vis. Graph. Image Process..

[13]  Ching Y. Suen,et al.  Historical review of OCR research and development , 1992, Proc. IEEE.

[14]  Boualem Boashash,et al.  A Probabilistic Approach for Automatic Parameters Selection for the Hybrid Edge Detector (Special Section on Digital Signal Processing) , 1997 .

[15]  A. Amin,et al.  Hand-printed character recognition system using artificial neural networks , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[16]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Chien-Huei Chen,et al.  Word recognition in a segmentation-free approach to OCR , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[18]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.