On the segmentation of multi-font printed Uygur scripts

In many OCR systems, character segmentation is a necessary preprocessing step for character recognition. It is an important step because incorrectly segmented characters are not likely to be correctly recognized. The most difficult case in character segmentation is cursive scripts. Uygur character is a cursive script. This paper presents the problem of segmenting the Uygur characters in various fonts and size in printed scripts. The technique for the segmentation is presented as following: line separation, word separation, segmenting the word into isolated characters consists of the two step's algorithms, topological segmentation, and quasi-topological segmentation. Topological segmentation is based on tracing the outer contour of a given word. Quasi-topological segmentation is based on the decision to section a character on a combination of feature-extraction and character-width measurements. Our approach relies on the feature of characters and fonts and profile models.

[1]  Roy L. Hoffman,et al.  Segmentation Methods for Recognition of Machine-Printed Characters , 1971, IBM J. Res. Dev..

[2]  Behrooz Parhami,et al.  Automatic recognition of printed Farsi texts , 1981, Pattern Recognit..

[3]  Mohamed Fakir,et al.  Recognition of Arabic Printed Scripts by Dynamic Programming Matching Method , 1993 .

[4]  S. S. Upda,et al.  Recognition of Arabic Characters , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Hussein Almuallim,et al.  A Method of Recognition of Arabic Cursive Handwriting , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Amin A. Shoukry,et al.  On-line recognition of handwritten isolated arabic characters , 1989, Pattern Recognit..

[7]  Talaat S. El-Sheikh,et al.  Computer recognition of arabic cursive scripts , 1988, Pattern Recognit..