Two-Stage Lexicon Reduction for Offline Arabic Handwritten Word Recognition

Given large number of words to be recognized, a two-stage strategy for eliminating unlikely candidates before recognition can be a reasonable and powerful approach for increasing the recognition speed. A holistic lexicon reduction technique for offline handwritten Arabic word recognition is proposed in this paper. The principle of this technique involves the extraction of dots and subwords from the cursive Arabic word image to describe its shape. In the first stage of reduction, the number of subwords in the input word is estimated. Then in the second stage, the word descriptor, based on the dots information, is used while taking into account only the candidates selected in the first stage. Experimental results on IFN/ENIT database, consisting of 26,459 cursive Arabic word images, show a lexicon reduction of 92.5% with accuracy of 74%.

[1]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[2]  Zhenyu He,et al.  Handwriting-based personal identification , 2006, Int. J. Pattern Recognit. Artif. Intell..

[3]  Adnan Amin Recognition of hand-printed characters based on structural description and inductive logic programming , 2003, Pattern Recognit. Lett..

[4]  Ehsanollah Kabir,et al.  A new segmentation technique for omnifont Farsi text , 2001, Pattern Recognit. Lett..

[5]  Adnan Amin,et al.  Off-line Arabic character recognition: the state of the art , 1998, Pattern Recognit..

[6]  Mohammad S. Khorsheed,et al.  Recognising handwritten Arabic manuscripts using a single hidden Markov model , 2003, Pattern Recognit. Lett..

[7]  Venu Govindaraju,et al.  Offline Arabic handwriting recognition: a survey , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Mokhtar Sellami,et al.  Classifiers combination and syntax analysis for Arabic literal amount recognition , 2006, Eng. Appl. Artif. Intell..

[9]  Venu Govindaraju,et al.  Serial classifier combination for handwritten word recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[10]  S. Shirali-Shahreza,et al.  Persian/Arabic Text Font Estimation using Dots , 2006, 2006 IEEE International Symposium on Signal Processing and Information Technology.

[11]  Venu Govindaraju,et al.  Syntactic methodology of pruning large lexicons in cursive script recognition , 2001, Pattern Recognit..

[12]  Christian Olivier,et al.  Multi-level Arabic Handwritten Words Recognition , 1998, SSPR/SPR.

[13]  Robert Sabourin,et al.  Large vocabulary off-line handwriting recognition: A survey , 2003, Pattern Analysis & Applications.

[14]  Saeed Bagheri Shouraki,et al.  Recognition of Persian Online Handwriting Using Elastic Fuzzy Pattern Recognition , 2007, Int. J. Pattern Recognit. Artif. Intell..

[15]  K. Yamada,et al.  WORD LEXICON REDUCTION BY CHARACTER SPOTTING , 2004 .

[16]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[17]  Ranadhir Ghosh,et al.  Modular Neural Network Design For The Problem Of Alphabetic Character Recognition , 2005, Int. J. Pattern Recognit. Artif. Intell..

[18]  Donggang Yu,et al.  Analysis and recognition of broken handwritten digits based on morphological structure and skeleton , 2005, Int. J. Pattern Recognit. Artif. Intell..

[19]  Venu Govindaraju,et al.  Pre-processing methods for handwritten Arabic documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[20]  Horst Bunke,et al.  Lexicon reduction in an framework based on quantized feature vectors , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[21]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[22]  Matthias Zimmermann,et al.  Lexicon reduction using key characters in cursive handwritten words , 1999, Pattern Recognit. Lett..

[23]  Keith L. Clark,et al.  A New Feature Selection Method for Text Classification , 2007, Int. J. Pattern Recognit. Artif. Intell..