Detection and compensation of undesirable discontinuities within the farsi/arabic subwords

In this paper, an unexplored subject in the domains of Farsi/Arabic handwritten word preprocessing is introduced. Subwords play a vital role in many applications such as cheque amount recognition, text recognition, lexicon reduction and subword-based word recognition. Correcting the faults occurred in subwords will improve the overall performance of these applications. A subword is a connected-component in the main body of a word. The occurrence of a discontinuity in a subword, divides the subword into two isolated parts. These parts are detected as two incorrect subwords. In our algorithm, before correcting these faults, the baseline of each subword is corrected using the proposed baseline correction method. Then, to limit the exploration area in matching stage, the dots are removed. Undesirable discontinuities in subwords are detected by using a template matching algorithm. Disconnected parts of a subword are joined together by using three different methods. Experiments show that the cubic polynomial-based compensation method causes the best results and 2.87 % improvement in the subword recognition rate.

[1]  J. W. Gorman,et al.  Fitting Equations to Data. , 1973 .

[2]  Mokhtar Sellami,et al.  A HYBRID APPROACH FOR ARABIC LITERAL AMOUNTS RECOGNITION , 2004 .

[3]  Karim Faez,et al.  Use of Legal Amount to Confirm or Correct the Courtesy Amount on Farsi Bank Checks , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[4]  Volker Märgner,et al.  Baseline estimation for Arabic handwritten words , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[5]  Ching Y. Suen,et al.  Segmentation-based recognition of handwritten touching pairs of digits using structural features , 2002, Pattern Recognit. Lett..

[6]  Dean P. Foster,et al.  Fitting Equations to Data , 1998 .

[7]  Shuyan Zhao,et al.  Two-stage segmentation of unconstrained handwritten Chinese character , 2003, Pattern Recognit..

[8]  Ching Y. Suen,et al.  A genetic framework using contextual knowledge for segmentation and recognition of handwritten numeral strings , 2007, Pattern Recognit..

[9]  Ashraf Elnagar,et al.  Segmentation of connected handwritten numeral strings , 2003, Pattern Recognit..

[10]  Mokhtar Sellami,et al.  Artificial neural network fusion: Application to Arabic words recognition , 2005, ESANN.

[11]  Jhing-Fa Wang,et al.  Segmentation of Single- or Multiple-Touching Handwritten Numeral String Using Background and Foreground Analysis , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  M. Pechwitz,et al.  IFN/ENIT: database of handwritten arabic words , 2002 .

[13]  Hong Yan,et al.  A Model-Based Segmentation Method for Handwritten Numeral Strings , 1998, Comput. Vis. Image Underst..

[14]  Umapada Pal,et al.  Touching numeral segmentation using water reservoir concept , 2003, Pattern Recognit. Lett..

[15]  Ching Y. Suen,et al.  Databases for recognition of handwritten Arabic cheques , 2003, Pattern Recognit..

[16]  Venu Govindaraju,et al.  Offline Arabic handwriting recognition: a survey , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Abderrazak Zahour,et al.  Arabic hand-written text-line extraction , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Neil W. Bergmann,et al.  An Arabic optical character recognition system using recognition-based segmentation , 2001, Pattern Recognit..

[20]  Karim Faez,et al.  Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM , 2001, Pattern Recognit..

[21]  Mohammad Bagher Menhaj,et al.  Simultaneous segmentation and recognition of Farsi/Latin printed texts with MLP , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[22]  Brian Everitt,et al.  Principles of Multivariate Analysis , 2001 .

[23]  Jamshid Shanbehzadeh,et al.  Persian/arabic handwritten word recognition using M-band packet wavelet transform , 2008, Image Vis. Comput..