Handwriting segmentation of unconstrained Oriya text

Segmentation of handwritten text into lines, words and characters is one of the important steps in the handwritten text recognition process. In this paper we propose a water reservoir concept-based scheme for segmentation of unconstrained Oriya handwritten text into individual characters. Here, at first, the text image is segmented into lines, and the lines are then segmented into individual words. For line segmentation, the document is divided into vertical stripes. Analysing the heights of the water reservoirs obtained from different components of the document, the width of a stripe is calculated. Stripe-wise horizontal histograms are then computed and the relationship of the peak-valley points of the histograms is used for line segmentation. Based on vertical projection profiles and structural features of Oriya characters, text lines are segmented into words. For character segmentation, at first, the isolated and connected (touching) characters in a word are detected. Using structural, topological and water reservoir concept-based features, characters of the word that touch are then segmented. From experiments we have observed that the proposed “touching character” segmentation module has 96.7% accuracy for two-character touching strings.

[1]  Jinho Kim,et al.  Recognition of unconstrained handwritten numeral strings by composite segmentation method , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[2]  Abderrazak Zahour,et al.  Arabic hand-written text-line extraction , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[3]  Luiz S. Oliveira,et al.  A NEW APPROACH TO SEGMENT HANDWRITTEN DIGITS , 2004 .

[4]  Jhing-Fa Wang,et al.  Segmentation of Single- or Multiple-Touching Handwritten Numeral String Using Background and Foreground Analysis , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Robert M. Haralick,et al.  A statistically based, highly accurate text-line segmentation method , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[6]  Yoshinobu Sato,et al.  Orientation Space Filtering for Multiple Orientation Line Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  U. Pal,et al.  Segmentation of Bangla unconstrained handwritten text , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[8]  Umapada Pal,et al.  Touching numeral segmentation using water reservoir concept , 2003, Pattern Recognit. Lett..

[9]  Sebastiano Impedovo,et al.  Automatic Bankcheck Processing: A New Engineered System , 1997, Int. J. Pattern Recognit. Artif. Intell..

[10]  Umapada Pal,et al.  Multioriented and curved text lines extraction from Indian documents , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Yasuaki Nakano,et al.  Segmentation methods for character recognition: from segmentation to document structure analysis , 1992, Proc. IEEE.