A segmentation-free approach to text recognition with application to Arabic text

Abstract. In recognizing cursive scripts, a major undertaking is segmenting cursive words into characters and isolating merged characters. The segmentation is usually the pivotal stage in the system to which a sizable portion of processing is devoted and a considerable share of recognition errors is attributed. The most notable feature of Arabic writing is its cursiveness. Compared to other features, the cursiveness of Arabic words poses the most difficult problem for recognition algorithms. In this work, we describe the design and implementation of an Arabic word recognition system. To recognize a word, the system does not segment it into characters in advance; rather, it recognizes the input word by detecting a set of “shape primitives” on the word. It then matches the regions of the word (represented by the detected primitives) with a set of symbol models. A spatial arrangement of symbol models that are matched to regions of the word, then, becomes the description of the recognized word. Since the number of potential arrangements of all symbol models is combinatorially large, the system imposes a set of constraints that pertain to word structure and spatial consistency. The system searches the space made up of the arrangements that satisfy the constraints, and tries to maximize the a posteriori\/ probability of the arrangement of symbol models. We measure the accuracy of the system not only on words but on isolated characters as well. For isolated characters, it has a recognition rate of 99.7% for synthetically degraded symbols and 94.1% for scanned symbols. For isolated words the system has a recognition rate of 99.4% for noise-free words, 95.6% for synthetically degraded words, and 73% for scanned words.

[1]  Sabri A. Mahmoud,et al.  Survey and bibliography of Arabic optical text recognition , 1995, Signal Process..

[2]  E. Shaddad,et al.  On the automatic reading of printed Arabic characters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[3]  Paul D. Gader,et al.  Application Of Mathematical Morphology To Handwritten ZIP Code Recognition , 1989, Other Conferences.

[4]  Fumitaka Kimura,et al.  Handwritten word recognition using lexicon free and lexicon directed word recognition algorithms , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[5]  Mohamed Fakir,et al.  Recognition of Arabic Printed Scripts by Dynamic Programming Matching Method , 1993 .

[6]  Gyeonghwan Kim,et al.  Handwritten word recognition for real-time applications , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[7]  Sabah S. Al-Fedaghi,et al.  Machine Recognition of Printed Arabic Text Utilizing Natural Language Morphology , 1991, Int. J. Man Mach. Stud..

[8]  Thomas A. Standish Data Structure Techniques , 1980 .

[9]  H. Y. Abdelazim,et al.  Automatic reading of bilingual typewritten test , 1989, Proceedings. VLSI and Computer Peripherals. COMPEURO 89.

[10]  Paul D. Gader,et al.  Matching database records to handwritten text , 1994, Electronic Imaging.

[11]  Robert M. Haralick,et al.  CD-ROM document database standard , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[12]  Kamal Jambi,et al.  Design and implementation of a system for recognizing Arabic handwritten words with learning ability , 1992 .

[13]  Hussein Almuallim,et al.  A Method of Recognition of Arabic Cursive Handwriting , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Andrew M. Gillies Automatic generation of morphological template features , 1990, Optics & Photonics.

[15]  Patrick S. P. Wang,et al.  Character segmentation techniques for handwritten text-a survey , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[16]  Adnan Amin,et al.  Machine Recognition Of Cursive Arabic Words , 1983, Optics & Photonics.

[17]  Robert M. Haralick Performance Characterization in Computer Vision , 1992, BMVC.

[18]  Jung-Hsien Chiang,et al.  Neural and Fuzzy Methods in Handwriting Recognition , 1997, Computer.

[19]  Sherif Sami El-Dabi,et al.  Arabic character recognition system: A statistical approach for recognizing cursive typewritten text , 1990, Pattern Recognit..

[20]  Robert M. Haralick,et al.  Global and local document degradation models , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[21]  Khosrow M. Hassibi Machine-printed Arabic OCR , 1994, Other Conferences.

[22]  Marc Parizeau,et al.  A Fuzzy-Syntactic Approach to Allograph Modeling for Cursive Script Recognition , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Chi Fang,et al.  A hypothesis testing approach to word recognition using an A* search algorithm , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[24]  Fred Stentiford,et al.  Automatic Feature Design for Optical Character Recognition Using an Evolutionary Search Procedure , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Paul D. Gader,et al.  Automatic Feature Generation for Handwritten Digit Recognition , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Jung-Hsien Chiang,et al.  Handwritten word recognition with character and inter-character neural networks , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[27]  Volker Märgner,et al.  SARAT-a system for the recognition of Arabic printed text , 1992, ICPR.

[28]  Prasanna G. Mulgaonkar,et al.  Word recognition in a segmentation-free approach to OCR , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[29]  Jung-Hsien Chiang,et al.  Hybrid fuzzy-neural systems in handwritten word recognition , 1997, IEEE Trans. Fuzzy Syst..