OCR error rate versus rejection rate for isolated handprint characters

Over twenty-five organizations participating in the First Census OCR Systems Conference submitted confidence data as well as character classification data for the digit test in that conference. A three parameter function of the rejection rate r is fit to the error rate versus rejection rate data derived from this data, and found to fit it very well over the range from r equals 0 to r equals 0.15. The probability distribution underlying the model e(r) curve is derived and shown to correspond to an inherently inefficient rejection process. With only a few exceptions that seem to be insignificant, all of the organizations submitting data to the conference for scoring seem to employ this same rejection process with a remarkable uniformity of efficiency with respect to the maximum efficiency allowed for this process. Two measures of rejection efficiency are derived, and a practical definition of ideal OCR performance in the classification of segmented characters is proposed. Perfect rejection is shown to be achievable, but only at the cost of reduced classification accuracy in most practical situations. Human classification of a subset of the digit test suggests that there is considerable room for improvement in machine OCR before performance at the level of the proposed ideal is achieved.