Choice of the smoothing parameter and efficiency of k-nearest neighbor classification

Abstract A simulation study was performed to investigate the sensitivity of the k -nearest neighbor (NN k ) rule of classification to the choice of k . The optimal choice of k was found to be a function of the dimension of the sample space, the size of the space, the covariance structure and the sample proportions. The nearest neighbor rules chosen using the k suggested by the simulations had correct classification rates at least as high as those rates for the linear discriminant function and the logistic regression method. In particular, the rule became more efficient as the difference in the covariance matrices increased, and also when the difference in sample proportion was large. An adaptive rule which selects k by iteratively maximizing the local Mahalanobis distance is shown to be efficient, thus abrogating the need to know the underlying population variance-covariance structure.