In this paper we present an effective character recognition algorithm, which can be applied mainly to typeset documents. Our aim was to compose a character recognition algorithm, which can be used to recognize simple typeset documents in a fast and reliable way. To get a good result by this algorithm the input text document should contain characters from the same character set with a small number of symbols. This condition does not mean a strong restriction as the documents in practice usually have this property. The main character recognition part of the algorithm is based on the Walsh transformation, which gives a verbose description about the image, like the symmetrical relations, placement of the foreground and background pixels, and so on. That is why we tried to apply it to recognize characters, and the algorithm proved to be fairly efficient and reliable for simple documents, since the feature vectors extracted by Walsh transformation can be well distinguished. Moreover, our method had very good results in tolerating different types of noise corruption.
[1]
Roger J. Green,et al.
Skeletonization of Arabic characters using clustering based skeletonization algorithm (CBSA)
,
1991,
Pattern Recognit..
[2]
Zen Chen,et al.
A Chinese-character thinning algorithm based on global features and contour information
,
1995,
Pattern Recognit..
[3]
Hideo Ogawa,et al.
Thinning and stroke segmentation for handwritten Chinese character recognition
,
1982,
Pattern Recognit..
[4]
Raymond W. Smith,et al.
Computer processing of line images: A survey
,
1987,
Pattern Recognit..
[5]
Attila Fazekas,et al.
An algorithm using Walsh transformation for compressing typeset documents
,
1999
.
[6]
Kim-Teng Lua,et al.
A new approach to stroke and feature point extraction in Chinese character recognition
,
1991,
Pattern Recognit. Lett..
[7]
Anil K. Jain,et al.
Feature extraction methods for character recognition-A survey
,
1996,
Pattern Recognit..