Optimal weighted nearest neighbour classifiers

We derive an asymptotic expansion for the excess risk (regret) of a weighted nearest-neighbour classifier. This allows us to find the asymptotically optimal vector of nonnegative weights, which has a rather simple form. We show that the ratio of the regret of this classifier to that of an unweighted k-nearest neighbour classifier depends asymptotically only on the dimension d of the feature vectors, and not on the underlying populations. The improvement is greatest when d=4, but thereafter decreases as $d\rightarrow\infty$. The popular bagged nearest neighbour classifier can also be regarded as a weighted nearest neighbour classifier, and we show that its corresponding weights are somewhat suboptimal when d is small (in particular, worse than those of the unweighted k-nearest neighbour classifier when d=1), but are close to optimal when d is large. Finally, we argue that improvements in the rate of convergence are possible under stronger smoothness assumptions, provided we allow negative weights. Our findings are supported by an empirical performance comparison on both simulated and real data sets.

[1]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[3]  Anil K. Jain,et al.  NOTE ON DISTANCE-WEIGHTED k-NEAREST NEIGHBOR RULES. , 1978 .

[4]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[5]  R. Z. Khasʹminskiĭ,et al.  Statistical estimation : asymptotic theory , 1981 .

[6]  J. Marron Optimal Rates of Convergence to Bayes Risk in Nonparametric Discrimination , 1983 .

[7]  J. Wellner,et al.  Empirical Processes with Applications to Statistics , 2009 .

[8]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[9]  李幼升,et al.  Ph , 1989 .

[10]  Asymptotic estimate of probability of misclassification for discriminant rules based on density estimates , 1989 .

[11]  O. Lepskii Asymptotically Minimax Adaptive Estimation. I: Upper Bounds. Optimally Adaptive Estimates , 1992 .

[12]  John Douglas Moore Book Review: Tubes , 1992 .

[13]  W. Polonik Measuring Mass Concentrations and Estimating Density Contour Clusters-An Excess Mass Approach , 1995 .

[14]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[15]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[16]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[17]  László Györfi,et al.  Nonparametric Regression Estimation , 2002 .

[18]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[21]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[22]  S. Raudys,et al.  Results in statistical discriminant analysis: a review of the former Soviet union literature , 2004 .

[23]  P. Hall,et al.  Properties of bagged nearest neighbour classifiers , 2005 .

[24]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[25]  Kee-Hoon Kang,et al.  Bandwidth choice for nonparametric classification , 2005 .

[26]  A. Tsybakov,et al.  Fast learning rates for plug-in classifiers , 2007, 0708.2321.

[27]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[28]  P. Massart,et al.  Statistical performance of support vector machines , 2008, 0804.0551.

[29]  Brian M. Steele,et al.  Exact bootstrap k-nearest neighbor learners , 2009, Machine Learning.

[30]  B. Park,et al.  Choice of neighbor order in nearest-neighbor classification , 2008, 0810.5276.

[31]  P. Rigollet,et al.  Optimal rates for plug-in estimators of density level sets , 2006, math/0611473.

[32]  Luc Devroye,et al.  On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification , 2010, J. Multivar. Anal..

[33]  Arnaud Guyader,et al.  On the Rate of Convergence of the Bagged Nearest Neighbor Estimate , 2010, J. Mach. Learn. Res..

[34]  M. P. Wand,et al.  Asymptotics and optimal bandwidth selection for highest density region estimation , 2010, 1010.0591.

[35]  Gonzalo Martínez-Muñoz,et al.  Out-of-bag estimation of the optimal sample size in bagging , 2010, Pattern Recognit..

[36]  Jiang-Hua Lu,et al.  Progress in Mathematics , 2013 .