Lexicon reduction using dots for off-line Farsi/Arabic handwritten word recognition

Unlike many other languages, 18 out of 32 Farsi characters have dots appearing in groups of one, two or three. Some of these letters share common primary shapes, differing only in the number of dots and whether the dots are above or below the primary shape. In this paper, a new concept of using dots in a cursively handwritten Farsi/Arabic word is introduced for lexicon reduction and a fast method for extracting dots is presented. The technique involves extraction and representation of number and position of dots from off-line handwritten words to eliminate unlikely candidates. Experimental results on a set of 12,000 handwritten word images yield a lexicon reduction of 93% with accuracy of 85%. The proposed lexicon reduction algorithm achieves the speedup factor of 2 as well as 13% improvement in recognition rate.

[1]  Volker Märgner,et al.  Arabic Handwriting Recognition Competition , 2005, ICDAR.

[2]  Mohammad S. Khorsheed,et al.  Recognising handwritten Arabic manuscripts using a single hidden Markov model , 2003, Pattern Recognit. Lett..

[3]  Venu Govindaraju,et al.  Syntactic methodology of pruning large lexicons in cursive script recognition , 2001, Pattern Recognit..

[4]  Horst Bunke,et al.  Lexicon reduction in an framework based on quantized feature vectors , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[5]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[6]  Karim Faez,et al.  Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM , 2001, Pattern Recognit..

[7]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[8]  Ehsanollah Kabir,et al.  A new segmentation technique for omnifont Farsi text , 2001, Pattern Recognit. Lett..

[9]  Venu Govindaraju,et al.  Serial classifier combination for handwritten word recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[10]  Venu Govindaraju,et al.  Offline Arabic handwriting recognition: a survey , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Giovanni Seni,et al.  Generalizing edit distance for handwritten text recognition , 1995, Electronic Imaging.

[12]  S. Srihari,et al.  Variable duration hidden markov model and morphological segmentation for handwritten word recognition , 1995, IEEE Transactions on Image Processing.

[13]  Venu Govindaraju,et al.  Pre-processing methods for handwritten Arabic documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[14]  Éric Anquetil,et al.  Lexical post-processing optimization for handwritten word recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[15]  Horst Bunke,et al.  Handwritten sentence recognition , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[16]  Karim Faez,et al.  Unconstrained Farsi handwritten word recognition using fuzzy vector quantization and hidden Markov models , 2001, Pattern Recognit. Lett..

[17]  Adnan Amin Recognition of hand-printed characters based on structural description and inductive logic programming , 2003, Pattern Recognit. Lett..

[18]  K. Yamada,et al.  WORD LEXICON REDUCTION BY CHARACTER SPOTTING , 2004 .

[19]  Paul D. Gader,et al.  Handwritten Word Recognition Using Segmentation-Free Hidden Markov Modeling and Segmentation-Based Dynamic Programming Techniques , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[21]  Adnan Amin,et al.  Off-line Arabic character recognition: the state of the art , 1998, Pattern Recognit..

[22]  Hong Yan,et al.  Algorithm for stroke width compensation of handwritten characters , 1996 .

[23]  G. Leedham,et al.  RAPID ANALYTICAL VERIFICATION OF HANDWRITTEN ALPHANUMERIC ADDRESS FIELDS , 2004 .

[24]  Robert Sabourin,et al.  Large vocabulary off-line handwriting recognition: A survey , 2003, Pattern Analysis & Applications.