Asymptotic expansions of the k nearest neighbor risk

The finite-sample risk of the k nearest neighbor classifier that uses a weighted L p -metric as a measure of class similarity is examined. For a family of classification problems with smooth distributions in R n , an asymptotic expansion for the risk is obtained in decreasing fractional powers of the reference sample size. An analysis of the leading expansion coefficients reveals that the optimal weighted L p -metric, that is, the metric that minimizes the finite-sample risk, tends to a weighted Euclidean (i.e., L 2 ) metric as the sample size is increased. Numerical simulations corroborate this finding for a pattern recognition problem with normal class-conditional densities.

[1]  G. N. Watson,et al.  The Harmonic Functions Associated with the Parabolic Cylinder , 2022 .

[2]  W. Fulks,et al.  Asymptotics. II. Laplace’s method for multiple integrals , 1961 .

[3]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[4]  Martin E. Hellman,et al.  The Nearest Neighbor Classification Rule with a Reject Option , 1970, IEEE Trans. Syst. Sci. Cybern..

[5]  Donald E. Knuth,et al.  Big Omicron and big Omega and big Theta , 1976, SIGA.

[6]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[7]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[8]  Luc Devroye,et al.  Any Discrimination Rule Can Have an Arbitrarily Bad Probability of Error for Finite Sample Size , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Keinosuke Fukunaga,et al.  An Optimal Global Nearest Neighbor Metric , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[11]  Santosh S. Venkatesh,et al.  Asymptotic predictions of the finite-sample risk of the k-nearest-neighbor classifier , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[12]  Demetri Psaltis,et al.  On the finite sample performance of the nearest neighbor classifier , 1993, IEEE Trans. Inf. Theory.

[13]  Karl Sims,et al.  Handwritten Character Classification Using Nearest Neighbor in Large Databases , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Tong Xu,et al.  Estimating the Bayes Risk from Sample Data , 1995, NIPS.