A Script Independent Technique for Extraction of Characters from Handwritten Word Images

A script independent character segmentation from word images technique has been reported here. Word to character segmentation is an important preprocessing step of optical character recognition process. But in case of handwritten text, presence of touching characters decreases the accuracy of the technique of the segmentation of the characters from the word. In this paper, segmentation of handwritten word of four different scripts namely, Bangla, Devanagri, Gurmukhi and Syloti are considered as the test samples. All these scripts are characterized by the presence of a distinct line along the top of the most of the characters forming the words, called the headline or Matra. Unlike English script, the characters of these handwritten scripts and its components often encircle the main character, making the conventional segmentation methodologies inapplicable. For the segmentation technique two fuzzy features, to identify the Matra region and potential segmentation point, are used here. Experimental results, using the proposed segmentation technique, on sample of 400 handwritten word images containing all the above mentioned scripts of Bangla, Devanagri, Gurmukhi and Syloti show a success rate of 95.41%, 93.61%, 91.23% and 92.37% respectively.

[1]  Veena Bansal,et al.  Segmentation of touching and fused Devanagari characters , 2002, Pattern Recognit..

[2]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Amardeep Singh,et al.  Detection and segmentation of Handwritten Text in Gurmukhi Script using Flexible Windowing , 2010 .

[4]  Bidyut Baran Chaudhuri,et al.  Segmentation of Bangla handwritten text into characters by recursive contour following , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[5]  Marco Furini,et al.  International Journal of Computer and Applications , 2010 .

[6]  Bidyut Baran Chaudhuri,et al.  Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[7]  Fuad Rahman,et al.  Recognition of handwritten Bengali characters: a novel multistage approach , 2002, Pattern Recognit..

[8]  Gurpreet Singh Lehal,et al.  An Iterative Algorithm for Segmentation of Isolated Handwritten Words in Gurmukhi Script , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[9]  Bidyut B. Chaudhuri,et al.  Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis , 2002 .

[10]  U. Pal,et al.  Segmentation of Bangla unconstrained handwritten text , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[11]  Subhadip Basu,et al.  A Fuzzy Technique for Segmentation of Handwritten Bangla Word Images , 2007, 2007 International Conference on Computing: Theory and Applications (ICCTA'07).

[12]  R.M.K. Sinha,et al.  On Devanagari document processing , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.

[13]  Sargur N. Srihari,et al.  Off-Line Cursive Script Word Recognition , 1989, IEEE Trans. Pattern Anal. Mach. Intell..