Segmentation of Printed Farsi/Arabic Words

Characters connectivity is a problem in automated printed Farsi/Arabic script recognition. This paper introduces a novel scheme based on wavelet transform to solve segmentation of printed Farsi/Arabic words into characters. Our novel algorithm employs a new wavelet transform by which the extracted wavelet coefficients are exploited, in detecting, underlying horizontal edges and base line. Projection of horizontal edges and their location on base line provide the segmentation points. A classification method distinguishes true segmenting points. New algorithm is robust against noise, gray level, font and size of characters. Simulation results provide a comparison between new algorithm and three schemes, closed contour, structural and holistic, in terms of precision, speed and robustness against Gaussian noise. Experimental Results indicate superiority of our scheme in terms of precision and show that new algorithm improves recognition speed by a factor of at least 2.5 times.

[1]  Abdesselam Bouzerdoum,et al.  A system for Arabic character recognition , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[2]  Mahmoud Reza Hashemi,et al.  Persian cursive script recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[3]  S. Mallat A wavelet tour of signal processing , 1998 .

[4]  Neil W. Bergmann,et al.  An Arabic optical character recognition system using recognition-based segmentation , 2001, Pattern Recognit..

[5]  Karim Faez,et al.  Recognition of isolated handwritten Persian/Arabic characters and numerals using support vector machines , 2003, 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718).

[6]  Behrooz Parhami,et al.  Automatic recognition of printed Farsi texts , 1981, Pattern Recognit..

[7]  Yuan Yan Tang,et al.  Wavelet Theory and Its Application to Pattern Recognition , 2000, Series in Machine Perception and Artificial Intelligence.

[8]  Brian Everitt,et al.  Principles of Multivariate Analysis , 2001 .

[9]  Gyeonghwan Kim,et al.  A Lexicon Driven Approach to Handwritten Word Recognition for Real-Time Applications , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Muhammad Sarfraz,et al.  Offline Arabic text recognition system , 2003, 2003 International Conference on Geometric Modeling and Graphics, 2003. Proceedings.

[11]  A. Dehghani,et al.  Off-line recognition of isolated Persian handwritten characters using multiple hidden Markov models , 2001, Proceedings International Conference on Information Technology: Coding and Computing.

[12]  Venu Govindaraju,et al.  The Role of Holistic Paradigms in Handwritten Word Recognition , 2009 .

[13]  Michael Spann,et al.  Segmentation and recognition of Arabic characters by structural classification , 1997, Image Vis. Comput..

[14]  Jung-Hsien Chiang,et al.  Handwritten word recognition with character and inter-character neural networks , 1997, IEEE Trans. Syst. Man Cybern. Part B.