Feature selection for low error rate OCR

Abstract A scheme suitable for selection of features for OCR systems has been developed. This scheme has been successfully applied to several type fonts resulting in systems which recognize upper and lower case alphanumerics with less than one error per 10 000 processed characters. Data based on a large number of test characters is collected and formatted to provide a basis for the actual selection of features. The error rate for the resulting recognition systems is then verified. Considerable portions of this process have been automated, while retaining adequate opportunity for the OCR systems designer to control and influence the selection process.