Analysis of KNN Density Estimation

We analyze the $\ell_1$ and $\ell_\infty$ convergence rates of k nearest neighbor density estimation method. Our analysis includes two different cases depending on whether the support set is bounded or not. In the first case, the probability density function has a bounded support and is bounded away from zero. We show that kNN density estimation is minimax optimal under both $\ell_1$ and $\ell_\infty$ criteria, if the support set is known. If the support set is unknown, then the convergence rate of $\ell_1$ error is not affected, while $\ell_\infty$ error does not converge. In the second case, the probability density function can approach zero and is smooth everywhere. Moreover, the Hessian is assumed to decay with the density values. For this case, our result shows that the $\ell_\infty$ error of kNN density estimation is nearly minimax optimal. The $\ell_1$ error does not reach the minimax lower bound, but is better than kernel density estimation.

[1]  P. K. Bhattacharya,et al.  Weak Convergence of $k$-NN Density and Regression Estimators with Varying $k$ and Applications , 1987 .

[2]  Sarah Ouadah,et al.  Uniform-in-bandwidth nearest-neighbor density estimation , 2013 .

[3]  Y. Mack,et al.  Rate of strong uniform convergence of k-NN density estimates , 1983 .

[4]  A. Rinaldo,et al.  Generalized density clustering , 2009, 0907.3454.

[5]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[6]  Samory Kpotufe,et al.  Modal-set estimation with an application to clustering , 2016, AISTATS.

[7]  E. Giné,et al.  Rates of strong uniform consistency for multivariate kernel density estimators , 2002 .

[8]  Luc Devroye,et al.  Lectures on the Nearest Neighbor Method , 2015 .

[9]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[10]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[11]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[12]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[13]  Rohana J. Karunamuni,et al.  A generalized reflection method of boundary correction in kernel density estimation , 2005 .

[14]  L. Devroye,et al.  The $L_1$ Convergence of Kernel Density Estimates , 1979 .

[15]  E. Gilbert A comparison of signalling alphabets , 1952 .

[16]  Alessandro Rinaldo,et al.  Uniform Convergence Rate of the Kernel Density Estimator Adaptive to Intrinsic Volume Dimension , 2018, ICML.

[17]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[18]  Uwe Einmahl,et al.  Uniform in bandwidth consistency of kernel-type function estimators , 2005 .

[19]  Sanjoy Dasgupta,et al.  Rates of convergence for the cluster tree , 2010, NIPS.

[20]  Masayuki Hirukawa,et al.  Nonparametric multiplicative bias correction for kernel-type density estimation on the unit interval , 2010, Comput. Stat. Data Anal..

[21]  Lifeng Lai,et al.  Minimax Rate Optimal Adaptive Nearest Neighbor Classification and Regression , 2019, IEEE Transactions on Information Theory.

[22]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[23]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[24]  Heinrich Jiang,et al.  Uniform Convergence Rates for Kernel Density Estimation , 2017, ICML.

[25]  L. Devroye,et al.  The Strong Uniform Consistency of Nearest Neighbor Density Estimates. , 1977 .

[26]  Hannu Oja,et al.  Classification Based on Hybridization of Parametric and Nonparametric Classifiers , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[28]  Philippe Rigollet,et al.  Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption , 2006, J. Mach. Learn. Res..

[29]  Herbert A. David,et al.  Order Statistics , 2011, International Encyclopedia of Statistical Science.