Segmentation Methods for Recognition of Machine-Printed Characters
暂无分享,去创建一个
This paper reports an investigation of some methods for isolating, or segmenting, characters during the reading of machine-printed text by optical character recognition systems. Two new segmentation algorithms using feature extraction techniques are presented; both are intended for use in the recognition of machine-printed lines of 10-, 11- and 12-pitch serif-type multifont characters. One of the methods, called quasi-topological segmentation, bases the decision to “section” a character on a combination of feature-extraction and character-width measurements. The other method, topological segmentation, involves feature extraction alone. The algorithms have been tested with an evaluation method that is independent of any particular recognition system. Test results are based on application of the algorithm to upper-case alphanumeric characters gathered from print sources that represent the existing world of machine printing. The topological approach demonstrated better performance on the test data than did the quasi-topological approach.
[1] J. R. Parks,et al. Letter Recognition and the Segmentation of Running Text , 1966, Inf. Control..
[2] George Nagy,et al. State of the art in pattern recognition , 1968 .
[3] J. Wolfowitz,et al. An Introduction to the Theory of Statistics , 1951, Nature.
[4] R. B. Hennis. The IBM 1975 optical page reader: part I: system design , 1968 .