论文信息 - Locally Determining the Number of Neighbors in the k-Nearest Neighbor Rule Based on Statistical Confidence

Locally Determining the Number of Neighbors in the k-Nearest Neighbor Rule Based on Statistical Confidence

The k-nearest neighbor rule is one of the most attractive pattern classification algorithms. In practice, the value of k is usually determined by the cross-validation method. In this work, we propose a new method that locally determines the number of nearest neighbors based on the concept of statistical confidence. We define the confidence associated with decisions that are made by the majority rule from a finite number of observations and use it as a criterion to determine the number of nearest neighbors needed. The new algorithm is tested on several real-world datasets and yields results comparable to those obtained by the k-nearest neighbor rule. In contrast to the k-nearest neighbor rule that uses a fixed number of nearest neighbors throughout the feature space, our method locally adjusts the number of neighbors until a satisfactory level of confidence is reached. In addition, the statistical confidence provides a natural way to balance the trade-off between the reject rate and the error rate by excluding patterns that have low confidence levels.

Leon N. Cooper | Predrag Neskovic | Jigang Wang

[1] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2] Dimitrios Gunopulos,et al. Locally Adaptive Metric Nearest-Neighbor Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[4] Shigeo Abe DrEng. Pattern Classification , 2001, Springer London.

[5] Jerome H. Friedman,et al. Flexible Metric Nearest Neighbor Classification , 1994 .

[6] C. J. Stone,et al. Consistent Nonparametric Regression , 1977 .

[7] Robert Tibshirani,et al. Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[8] David G. Stork,et al. Pattern Classification , 1973 .

[9] G. Lugosi,et al. On the Strong Universal Consistency of Nearest Neighbor Regression Function Estimates , 1994 .

[10] Luc Devroye,et al. On the Inequality of Cover and Hart in Nearest Neighbor Discrimination , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[12] J. L. Hodges,et al. Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .