Classification Error for a Very Large Number of Classes

Classification error is analyzed for a situation where the number of possible classes may be on the order of a hundred or more. The error associated with classifying to a single class is shown to depend mainly on average nearest-neighbor distance between class means, noise level, and effective dimensionality of the class mean distribution and not much on other aspects of the distribution, noise correlation, or number of classes. Since single class error is large, separation of classes into groups is also explored. Group classification error has the same properties as single class error but the size of the error is moderated by the Bayes overlap between groups. Standard curves are provided to predict single class and group error. Also discussed are the effect of pattern blurring on classification error and the nearest-neighbor distance statistics throughout a distribution.

[1]  D. Fraser Nonparametric methods in statistics , 1957 .

[2]  Keinosuke Fukunaga,et al.  An Algorithm for Finding Intrinsic Dimensionality of Data , 1971, IEEE Transactions on Computers.

[3]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[4]  Larry D. Hostetler,et al.  k-nearest-neighbor Bayes-risk estimation , 1975, IEEE Trans. Inf. Theory.