Minimax Rate Optimal Adaptive Nearest Neighbor Classification and Regression

k Nearest Neighbor (kNN) method is a simple and popular statistical method for classification and regression. For both classification and regression problems, existing works have shown that, if the distribution of the feature vector has bounded support and the probability density function is bounded away from zero in its support, the convergence rate of the standard kNN method, in which k is the same for all test samples, is minimax optimal. On the contrary, if the distribution has unbounded support, we show that there is a gap between the convergence rate achieved by the standard kNN method and the minimax bound. To close this gap, we propose an adaptive kNN method, in which different k is selected for different samples. Our selection rule does not require precise knowledge of the underlying distribution of features. The new proposed method significantly outperforms the standard one. We characterize the convergence rate of the proposed adaptive method, and show that it matches the minimax lower bound.

[1]  Jean-Yves Audibert Classification under polynomial entropy and margin assump-tions and randomized estimators , 2004 .

[2]  Adam Krzyżak,et al.  Rates of convergence for partitioning and nearest neighbor regression estimates with unbounded data , 2006 .

[3]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[4]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[5]  J. Marron Optimal Rates of Convergence to Bayes Risk in Nonparametric Discrimination , 1983 .

[6]  A. Tsybakov,et al.  Fast learning rates for plug-in classifiers , 2007, 0708.2321.

[7]  Sébastien Gadat,et al.  Classification in general finite dimensional spaces with the k-nearest neighbor rule , 2016 .

[8]  Gilles Blanchard,et al.  On the Rate of Convergence of Regularized Boosting Classifiers , 2003, J. Mach. Learn. Res..

[9]  Lifeng Lai,et al.  Minimax Regression via Adaptive Nearest Neighbor , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[10]  G. Lugosi,et al.  On the Strong Universal Consistency of Nearest Neighbor Regression Function Estimates , 1994 .

[11]  L. Gyorfi The rate of convergence of k_n -NN regression estimates and classification rules (Corresp.) , 1981 .

[12]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[13]  Timothy I. Cannings,et al.  Local nearest neighbour classification with applications to semi-supervised learning , 2017, The Annals of Statistics.

[14]  Dacheng Tao,et al.  On the Rates of Convergence From Surrogate Risk Minimizers to the Bayes Optimal Classifier , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[15]  László Györfi,et al.  Rate of Convergence of $k$-Nearest-Neighbor Classification Rule , 2017, J. Mach. Learn. Res..

[16]  Luc Devroye,et al.  Lectures on the Nearest Neighbor Method , 2015 .

[17]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[18]  Thomas B. Berrett,et al.  Efficient multivariate entropy estimation via $k$-nearest neighbour distances , 2016, The Annals of Statistics.

[19]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[20]  Yuhong Yang,et al.  Minimax Nonparametric Classification—Part I: Rates of Convergence , 1998 .

[21]  Adam Krzyzak,et al.  On the Rate of Convergence of Local Averaging Plug-In Classification Rules Under a Margin Condition , 2007, IEEE Transactions on Information Theory.

[22]  Arnaud Guyader,et al.  Rates of Convergence of the Functional $k$-Nearest Neighbor Estimate , 2010, IEEE Transactions on Information Theory.

[23]  Samory Kpotufe,et al.  k-NN Regression Adapts to Local Intrinsic Dimension , 2011, NIPS.

[24]  John B. Anderson Simulated error performance of multi-h phase codes , 1981, IEEE Trans. Inf. Theory.

[25]  László Györfi,et al.  The Rate of Convergence of k ,-NN Regression Estimates and Classification Rules , 1978 .

[26]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[27]  Herbert A. David,et al.  Order Statistics , 2011, International Encyclopedia of Statistical Science.

[28]  Sanjoy Dasgupta,et al.  Rates of Convergence for Nearest Neighbor Classification , 2014, NIPS.

[29]  Lifeng Lai,et al.  Analysis of KNN Information Estimators for Smooth Distributions , 2020, IEEE Transactions on Information Theory.

[30]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[31]  Adam Krzyżak,et al.  Optimal global rates of convergence for nonparametric regression with unbounded data , 2009 .

[32]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..