A weighted k-nearest neighbor density estimate for geometric inference

Motivated by a broad range of potential applications in topological and geometric inference, we introduce a weighted version of the k-nearest neighbor density estimate. Various pointwise consistency results of this estimate are established. We present a general central limit theorem under the lightest possible conditions. In addition, a strong approximation result is obtained and the choice of the optimal set of weights is discussed. In particular, the classical k-nearest neighbor estimate is not optimal in a sense described in the manuscript. The proposed method has been implemented to recover level sets in both simulated and real-life data.

[1]  Donald Fraser,et al.  Nonparametric Estimation IV , 1951 .

[2]  T. Broadbent Measure and Integral , 1957, Nature.

[3]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[4]  J. Gurland Multidimensional Gaussian Distributions (Kenneth S. Miller) , 1966 .

[5]  W. F. Trench,et al.  Introduction to Real Analysis: An Educational Approach , 2009 .

[6]  V. A. Epanechnikov Non-Parametric Estimation of a Multivariate Probability Density , 1969 .

[7]  Larry D. Hostetler,et al.  Optimization of k nearest neighbor density estimates , 1973, IEEE Trans. Inf. Theory.

[8]  P. Major,et al.  An approximation of partial sums of independent RV'-s, and the sample DF. I , 1975 .

[9]  Péter Major,et al.  The approximation of partial sums of independent RV's , 1976 .

[10]  L. Devroye Nonparametric Discrimination and Density Estimation. , 1976 .

[11]  A. Zygmund,et al.  Measure and integral : an introduction to real analysis , 1977 .

[12]  J. Yackel,et al.  Consistency Properties of Nearest Neighbor Density Function Estimators , 1977 .

[13]  L. Devroye,et al.  The Strong Uniform Consistency of Nearest Neighbor Density Estimates. , 1977 .

[14]  J. Yackel,et al.  Large Sample Properties of Nearest Neighbor Density Function Estimators , 1977 .

[15]  R Collins,et al.  Maximum entropy histograms , 1977 .

[16]  P. Deheuvels Estimation non paramétrique de la densité par histogrammes généralisés , 1977 .

[17]  M. Rosenblatt,et al.  Multivariate k-nearest neighbor density estimates , 1979 .

[18]  L. Devroye,et al.  Detection of Abnormal Behavior Via Nonparametric Estimation of the Support , 1980 .

[19]  Keinosuke Fukunaga,et al.  The optimal distance measure for nearest neighbor classification , 1981, IEEE Trans. Inf. Theory.

[20]  Keinosuke Fukunaga,et al.  An Optimal Global Nearest Neighbor Metric , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  C. C. Rodriguez,et al.  Maximum entropy histograms , 1985 .

[22]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[23]  John Van Ryzin,et al.  Large sample properties of maximum entropy histograms , 1986, IEEE Trans. Inf. Theory.

[24]  P. K. Bhattacharya,et al.  Weak Convergence of $k$-NN Density and Regression Estimators with Varying $k$ and Applications , 1987 .

[25]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[26]  E. Rodr On a New Class of Density , 1992 .

[27]  A. Tsybakov,et al.  Minimax theory of image reconstruction , 1993 .

[28]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[29]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[30]  P. Spreij Probability and Measure , 1996 .

[31]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[32]  Sunil Arya,et al.  ANN: library for approximate nearest neighbor searching , 1998 .

[33]  C. C. Rodriguez Optimal recovery of local truth , 2000, physics/0010063.

[34]  C. Villani Topics in Optimal Transportation , 2003 .

[35]  R. Willink Relationships Between Central Moments and Cumulants, with Formulae for the Central Moments of Gamma Distributions , 2003 .

[36]  D. Donoho,et al.  Adaptive multiscale detection of filamentary structures embedded in a background of uniform random points , 2003 .

[37]  Antonio Cuevas,et al.  Set estimation: an overview and some recent developments , 2003 .

[38]  C. Villani,et al.  Quantitative Concentration Inequalities for Empirical Measures on Non-compact Spaces , 2005, math/0503123.

[39]  Xiaoming Huo,et al.  ADAPTIVE MULTISCALE DETECTION OF FILAMENTARY STRUCTURES IN A BACKGROUND OF UNIFORM RANDOM POINTS 1 , 2006 .

[40]  A. Petrunin Semiconcave Functions in Alexandrov???s Geometry , 2013, 1304.0292.

[41]  A. Cuevas,et al.  A nonparametric approach to the estimation of lengths and surface areas , 2007, 0708.2180.

[42]  T. Hastie,et al.  Principal Curves , 2007 .

[43]  Pierre Alliez,et al.  Computational geometry algorithms library , 2008, SIGGRAPH '08.

[44]  Bruno Pelletier,et al.  Exact rates in density support estimation , 2008 .

[45]  Frédéric Chazal,et al.  Normal cone approximation and offset shape isotopy , 2009, Comput. Geom..

[46]  L. Wasserman,et al.  On the path density of a gradient field , 2008, 0805.4141.

[47]  Frédéric Chazal,et al.  Stability of Curvature Measures , 2008, Comput. Graph. Forum.

[48]  Bruno Pelletier,et al.  Asymptotic Normality in Density Support Estimation , 2009 .

[49]  Frédéric Chazal,et al.  A Sampling Theory for Compact Sets in Euclidean Space , 2009, Discret. Comput. Geom..

[50]  Frédéric Chazal,et al.  Boundary Measures for Geometric Inference , 2010, Found. Comput. Math..

[51]  L. Wasserman,et al.  Nonparametric Filament Estimation , 2010 .

[52]  Frédéric Chazal,et al.  Geometric Inference for Measures based on Distance Functions , 2011 .

[53]  Stephen Smale,et al.  A Topological View of Unsupervised Learning from Noisy Data , 2011, SIAM J. Comput..

[54]  Frédéric Chazal,et al.  Geometric Inference for Probability Measures , 2011, Found. Comput. Math..

[55]  R. Samworth Optimal weighted nearest neighbour classifiers , 2011, 1101.5783.