Very fast optimal bandwidth selection for univariate kernel density estimation

Most automatic bandwidth selection procedures for kernel density estimates require estimation of quantities involving the density derivatives. Estimation of modes and in∞exion points of densities also require derivative estimates. The computational complexity of evaluating the density derivative at M evaluation points given N sample points from the density is O(MN). In this paper we propose a computationally e‐cient †iexact approximation algorithm for the univariate Gaussian kernel based density derivative estimation that reduces the computational complexity from O(MN) to linear O(N + M). The constant depends on the desired arbitrary accuracy, †. We apply the density derivative evaluation procedure to estimate the optimal bandwidth for kernel density estimation, a process that is often intractable for large data sets. For example for N = M = 409;600 points while the direct evaluation of the density derivative takes around 12:76 hours the fast evaluation requires only 65 seconds with an error of around 10 i12 . Algorithm details, error bounds, procedure to choose the parameters and numerical experiments are presented. We demonstrate the speedup achieved on the bandwidth selection using the solve-the-equation plug-in method. We also demonstrate that the proposed procedure can be extremely useful for speeding up exploratory projection pursuit techniques.

[1]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[2]  D. Owen Handbook of Mathematical Functions with Formulas , 1965 .

[3]  A. G. Greenhill,et al.  Handbook of Mathematical Functions with Formulas, Graphs, , 1971 .

[4]  V. Alekseev Estimation of a probability density function and its derivatives , 1972 .

[5]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[6]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[7]  R. Singh Improvement on Some Known Nonparametric Uniformly Consistent Estimators of Derivatives of a Density , 1977 .

[8]  R. Singh Applications of Estimators of a Density and its Derivatives to Certain Statistical Problems , 1977 .

[9]  B. Silverman,et al.  Algorithm AS 176: Kernel Density Estimation Using the Fast Fourier Transform , 1982 .

[10]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[11]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[12]  Leslie Greengard,et al.  The Fast Gauss Transform , 1991, SIAM J. Sci. Comput..

[13]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[14]  A. Izenman Recent Developments in Nonparametric Density Estimation , 1991 .

[15]  M. Wand,et al.  EXACT MEAN INTEGRATED SQUARED ERROR , 1992 .

[16]  M. Wand,et al.  Multivariate plug-in bandwidth selection , 1994 .

[17]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[18]  M. C. Jones,et al.  A Brief Survey of Bandwidth Selection for Density Estimation , 1996 .

[19]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[20]  Andrew W. Moore,et al.  'N-Body' Problems in Statistical Learning , 2000, NIPS.

[21]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[22]  George Roussos,et al.  A New Error Estimate of the Fast Gauss Transform , 2002, SIAM J. Sci. Comput..

[23]  Andrew W. Moore,et al.  Nonparametric Density Estimation: Toward Computational Tractability , 2003, SDM.

[24]  Ramani Duraiswami,et al.  Improved Fast Gauss Transform , 2003 .

[25]  Larry S. Davis,et al.  Improved fast gauss transform and efficient kernel density estimation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Larry S. Davis,et al.  Efficient Kernel Machines Using the Improved Fast Gauss Transform , 2004, NIPS.