Describes an experiment in estimating the Bayes error of an image classification problem: a difficult, practically important, two-class character recognition problem. The Bayes error gives the "intrinsic difficulty" of the problem since it is the minimum error achievable by any classification method. Since for many realistically complex problems, deriving this analytically appears to be hopeless, the authors approach the task empirically. The authors proceed first by expressing the problem precisely in terms of ideal prototype images and an image defect model, and then by carrying out the estimation on pseudorandomly simulated data. Arriving at sharp estimates seems inevitably to require both large sample sizes-in the authors' trial, over a million images-and careful statistical extrapolation. The study of the data reveals many interesting statistics, which allow the prediction of the worst-case time/space requirements for any given classifier performance, expressed as a combination of error and reject rates.
[1]
Henry S. Baird,et al.
Document image defect models
,
1995
.
[2]
Henry S. Baird,et al.
Document image defect models and their uses
,
1993,
Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).
[3]
Godfried T. Toussaint,et al.
Bibliography on estimation of misclassification
,
1974,
IEEE Trans. Inf. Theory.
[4]
D. J. Hand,et al.
Recent advances in error rate estimation
,
1986,
Pattern Recognit. Lett..
[5]
Richard O. Duda,et al.
Pattern classification and scene analysis
,
1974,
A Wiley-Interscience publication.
[6]
Henry S. Baird,et al.
Asymptotic accuracy of two-class discrimination
,
1994
.