Selection of distance metrics and feature subsets for K-nearest neighbor classifiers

The k-nearest neighbor (kNN) classifier is a popular and effective method for associating a feature vector with a unique element in a known, finite set of classes. A common choice for the distance metric used in kNN classification is the quadratic distance $Q(x,\ A,\ y) = (x - y)\sp\prime A(x - y)$, where x and y are n-vectors of features, A is a symmetric $n \times n$ matrix, and prime denotes transpose. For finite sets of training samples the choice of matrix A is important in optimizing classifier performance. We show that A can be approximately optimized via gradient descent on a sigmoidally smoothed estimate of the classifier's probability of error. We describe an algorithm for performing the metric selection, and compare the performance of our method with that of other methods. We demonstrate that adding noise during the descent process can reduce the effects of overfitting. We further suggest how feature subset selection can be treated as a special case of this metric selection.

[1]  Xiaobo Li,et al.  Nearest neighbor classification on two types of SIMD machines , 1991, Parallel Comput..

[2]  Keinosuke Fukunaga,et al.  The optimal distance measure for nearest neighbor classification , 1981, IEEE Trans. Inf. Theory.

[3]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[4]  Richard Bronson,et al.  Schaum's Outline of Theory and Problems of Matrix Operations , 1988 .

[5]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[6]  H. V. Shurmer,et al.  The electronic nose , 1994 .

[7]  A. Izenman Recent Developments in Nonparametric Density Estimation , 1991 .

[8]  Larry D. Hostetler,et al.  k-nearest-neighbor Bayes-risk estimation , 1975, IEEE Trans. Inf. Theory.

[9]  J. Davenport Editor , 1960 .

[10]  Bart Kosko,et al.  Stochastic competitive learning , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[11]  Keinosuke Fukunaga,et al.  Leave-One-Out Procedures for Nonparametric Error Estimates , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[13]  Shlomo Geva,et al.  Adaptive nearest neighbor pattern classification , 1991, IEEE Trans. Neural Networks.

[14]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[15]  M. Kendall Probability and Statistical Inference , 1956, Nature.

[16]  Jack Sklansky,et al.  Feature Selection for Automatic Classification of Non-Gaussian Data , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[17]  Kuldip K. Paliwal,et al.  Fast K-dimensional tree algorithms for nearest neighbor search with application to vector quantization encoding , 1992, IEEE Trans. Signal Process..

[18]  Bart Kosko,et al.  Differential competitive learning for centroid estimation and phoneme recognition , 1991, IEEE Trans. Neural Networks.

[19]  Larry D. Hostetler,et al.  Optimization of k nearest neighbor density estimates , 1973, IEEE Trans. Inf. Theory.

[20]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[21]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[22]  Michael T. Manry,et al.  Iterative improvement of a nearest neighbor classifier , 1991, Neural Networks.