Non-uniform slant estimation and correction for Farsi/Arabic handwritten words

Slant correction is an important part of the normalization task in OCR applications. Due to some special specifications of Farsi and Arabic manuscripts, conventional deslanting methods proposed for other languages do not work properly. In this paper, a fast method is first introduced to estimate the overall tilt of a handwritten word based on directional filters. After overall deslanting, a novel non-uniform slant estimation algorithm computes the remaining slant of each near-vertical stroke of the word, separately. Each candidate stroke is traced and its slant is calculated. A non-uniform slant correction algorithm is also proposed to reduce the remaining slants of each candidate stroke keeping the distortions of other strokes of the word at a minimum level. Thanks to the special characteristics of Farsi/Arabic scripts, slants are estimated in a specific strip of the written words. A comparison between our approach and three other prevalent methods is drawn. Experiments show that the proposed overall slant estimation method not only represents the least estimation error, but is also the fastest algorithm. The best results are achieved using the proposed overall and non-uniform deslanting methods. It is concluded that successful results can be achieved by considering the special specifications of these two languages.

[1]  Fumitaka Kimura,et al.  Handwritten ZIP code recognition using lexicon free word recognition algorithm , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[2]  Junbiao Yan,et al.  An Approach to Tilt Correction of Vehicle License Plate , 2007, 2007 International Conference on Mechatronics and Automation.

[3]  C. Scagliola,et al.  Generalised projections: a tool for cursive handwriting normalisation , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[4]  Horst Bunke,et al.  Non-Uniform Slant Correction for Handwritten Text Line Recognition , 2007 .

[5]  Brian Everitt,et al.  Principles of Multivariate Analysis , 2001 .

[6]  Fumitaka Kimura,et al.  Accuracy improvement of slant estimation for handwritten words , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[7]  Gyeonghwan Kim,et al.  Slant correction of handwritten strings based on structural properties of Korean characters , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[8]  Satoshi Naoi,et al.  A segmentation method for touching italic characters , 2004, ICPR 2004.

[9]  Malayappan Shridhar,et al.  Handwritten address interpretation using word recognition with and without lexicon , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.

[10]  Takuma Yamaguchi,et al.  Digit recognition in a natural scene with skew and slant normalization , 2004, International Journal of Document Analysis and Recognition (IJDAR).

[11]  Jamshid Shanbehzadeh,et al.  Persian/arabic handwritten word recognition using M-band packet wavelet transform , 2008, Image Vis. Comput..

[12]  Rafael C. González,et al.  Local Determination of a Moving Contrast Edge , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Fumitaka Kimura,et al.  Application of slant correction to handwritten Japanese address recognition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[14]  Venu Govindaraju,et al.  Pre-processing methods for handwritten Arabic documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[15]  Nikos Fakotakis,et al.  Slant estimation algorithm for OCR systems , 2001, Pattern Recognit..

[16]  C. Suen,et al.  Improvement in handwritten numeral string recognition by slant normalization and contextual information , 2004 .

[17]  Seiichi Uchida,et al.  Nonuniform slant correction using dynamic programming , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[18]  Karim Faez,et al.  A novel two-stage algorithm for baseline estimation and correction in Farsi and Arabic handwritten text line , 2008, 2008 19th International Conference on Pattern Recognition.

[19]  Nikos Fakotakis,et al.  New algorithms for skewing correction and slant removal on word-level [OCR] , 1999, ICECS'99. Proceedings of ICECS '99. 6th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.99EX357).

[20]  Jian-xiong Dong,et al.  Cursive word skew/slant corrections based on Radon transform , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[21]  Wataru Ohyama,et al.  Local slant estimation for handwritten English words , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[22]  Nikos Fakotakis,et al.  An unconstrained handwriting recognition system , 2002, International Journal on Document Analysis and Recognition.

[23]  Gyeonghwan Kim,et al.  An efficient approach for slant correction of handwritten Korean strings based on structural properties , 2003, Pattern Recognit. Lett..

[24]  Venu Govindaraju,et al.  Offline Arabic handwriting recognition: a survey , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Miguel A. Ferrer,et al.  Slant estimation of handwritten characters by means of Zernike moments , 2005 .

[26]  Juergen Luettin,et al.  A new normalization technique for cursive handwritten words , 2001, Pattern Recognit. Lett..

[27]  P. Nagabhushan,et al.  Geometric Model and Projection Based Algorithms for Tilt Correction and Extraction of Acsenders / Descenders for Cursive Word Recognition , 2007, 2007 International Conference on Signal Processing, Communications and Networking.

[28]  M. Pechwitz,et al.  IFN/ENIT: database of handwritten arabic words , 2002 .

[29]  Changming Sun,et al.  Skew and slant correction for document images using gradient direction , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.