Strategies for Large Handwritten Farsi/Arabic Lexicon Reduction

Given large number of words to be recognized, lexicon reduction strategy for eliminating unlikely candidates before recognition can be a reasonable and powerful approach for increasing the recognition speed. In this paper, we describe a holistic approach for large Arabic handwritten lexicon reduction which is based on inherent properties of Arabic writing. The principal of this technique involves extraction of dots, diacritics and subwords from the cursive Arabic word image to describe its shape. In the first stage of lexicon reduction, the number of subwords in the input word is estimated. Then, in the second stage, the word descriptor, based on the dots and diacritics information, is used while taking into account only the candidates selected in the first stage. Experimental results on IFN/ENIT database, consisting of 26,459 cursive Arabic word images, show a lexicon reduction of 92.5% with accuracy of 74%.

[1]  Venu Govindaraju,et al.  Offline Arabic handwriting recognition: a survey , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  M. Pechwitz,et al.  IFN/ENIT: database of handwritten arabic words , 2002 .

[3]  S. Shirali-Shahreza,et al.  Persian/Arabic Text Font Estimation using Dots , 2006, 2006 IEEE International Symposium on Signal Processing and Information Technology.

[4]  Éric Anquetil,et al.  Lexical post-processing optimization for handwritten word recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[5]  Robert Sabourin,et al.  Large vocabulary off-line handwriting recognition: A survey , 2003, Pattern Analysis & Applications.

[6]  Mokhtar Sellami,et al.  Classifiers combination and syntax analysis for Arabic literal amount recognition , 2006, Eng. Appl. Artif. Intell..

[7]  Matthias Zimmermann,et al.  Lexicon reduction using key characters in cursive handwritten words , 1999, Pattern Recognit. Lett..

[8]  Ehsanollah Kabir,et al.  A new segmentation technique for omnifont Farsi text , 2001, Pattern Recognit. Lett..

[9]  Horst Bunke,et al.  Lexicon reduction in an framework based on quantized feature vectors , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[10]  Venu Govindaraju,et al.  Serial classifier combination for handwritten word recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[11]  Adnan Amin,et al.  Recognition of hand-printed characters based on structural description and inductive logic programming , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[12]  K. Yamada,et al.  WORD LEXICON REDUCTION BY CHARACTER SPOTTING , 2004 .

[13]  Giovanni Seni,et al.  Generalizing edit distance for handwritten text recognition , 1995, Electronic Imaging.

[14]  Venu Govindaraju,et al.  Syntactic methodology of pruning large lexicons in cursive script recognition , 2001, Pattern Recognit..