On optimum choice of k

A major issue in k-nearest neighbor classification is how to choose the optimum value of the neighborhood parameter k. Popular cross-validation techniques often fail to guide us well in selecting k mainly due to the presence of multiple minimizers of the estimated misclassification rate. This article investigates a Bayesian method in this connection, which solves the problem of multiple optimizers. The utility of the proposed method is illustrated using some benchmark data sets.

[1]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2]  Niall M. Adams,et al.  Likelihood inference in nearest‐neighbour classification models , 2003 .

[3]  M. Stone Cross-validation:a review 2 , 1978 .

[4]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[5]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[6]  C. Holmes,et al.  A probabilistic nearest neighbour method for statistical pattern recognition , 2002 .

[7]  W. Issel,et al.  Aho, A. V. / Hopcroft, J. E. / Ullman, J. D., The Design and Analysis of Computer Algorithms. London‐Amsterdam‐Don Mills‐Sydney. Addison‐Wesley Publ. Comp. 1974 X, 470 S., $ 24,– , 1979 .

[8]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[9]  Yuhong Yang,et al.  Combining Nearest Neighbor Classifiers Versus Cross-Validation Selection , 2004, Statistical applications in genetics and molecular biology.

[10]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[11]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[12]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[13]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[14]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[16]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[17]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[18]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[19]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[20]  David G. Stork,et al.  Pattern Classification , 1973 .

[21]  C. A. Murthy,et al.  On visualization and aggregation of nearest neighbor classifiers , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.