Fast Computation of Kernel Estimators

The computational complexity of evaluating the kernel density estimate (or its derivatives) at m evaluation points given n sample points scales quadratically as O(nm)—making it prohibitively expensive for large datasets. While approximate methods like binning could speed up the computation, they lack a precise control over the accuracy of the approximation. There is no straightforward way of choosing the binning parameters a priori in order to achieve a desired approximation error. We propose a novel computationally efficient ε-exact approximation algorithm for the univariate Gaussian kernel-based density derivative estimation that reduces the computational complexity from O(nm) to linear O(n+m). The user can specify a desired accuracy ε. The algorithm guarantees that the actual error between the approximation and the original kernel estimate will always be less than ε. We also apply our proposed fast algorithm to speed up automatic bandwidth selection procedures. We compare our method to the best available binning methods in terms of the speed and the accuracy. Our experimental results show that the proposed method is almost twice as fast as the best binning methods and is around five orders of magnitude more accurate. The software for the proposed method is available online.

[1]  V. Alekseev Estimation of a probability density function and its derivatives , 1972 .

[2]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[3]  R. Singh Applications of Estimators of a Density and its Derivatives to Certain Statistical Problems , 1977 .

[4]  David W. Scott,et al.  Using Computer-Binned Data for Density Estimation , 1981 .

[5]  B. Silverman,et al.  Algorithm AS 176: Kernel Density Estimation Using the Fast Fourier Transform , 1982 .

[6]  Peter Hall,et al.  The Influence of Rounding Errors on Some Nonparametric Estimators of a Density and its Derivatives , 1982 .

[7]  B. Silverman,et al.  Kernel Density Estimation Using the Fast Fourier Transform , 1982 .

[8]  M. C. Jones,et al.  On the errors involved in computing the empirical characteristic function , 1983 .

[9]  D. W. Scott,et al.  Kernel density estimation with binned data , 1985 .

[10]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[11]  M. C. Jones Discretized and Interpolated Kernel Density Estimates , 1989 .

[12]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[13]  M. Wand,et al.  EXACT MEAN INTEGRATED SQUARED ERROR , 1992 .

[14]  Jianqing Fan,et al.  Fast implementations of nonparametric curve estimators , 1993 .

[15]  T. Gasser,et al.  Fast Algorithms for Nonparametric Curve Estimation , 1994 .

[16]  L Greengard,et al.  Fast Algorithms for Classical Physics , 1994, Science.

[17]  M. Wand Fast Computation of Multivariate Kernel Estimators , 1994 .

[18]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[19]  B. Turlach Fast implementation of density-weighted average derivative estimation , 1995 .

[20]  B. Turlach,et al.  Fast Computation of Auxiliary Quantities in Local Polynomial Regression , 1995 .

[21]  M. C. Jones,et al.  A Brief Survey of Bandwidth Selection for Density Estimation , 1996 .

[22]  Matt P. Wand,et al.  On the Accuracy of Binned Kernel Density Estimators , 1994 .

[23]  M. Wand,et al.  Accuracy of Binned Kernel Functional Approximations , 1995 .

[24]  Gabriele Steidl,et al.  Fast Summation at Nonequispaced Knots by NFFT , 2003, SIAM J. Sci. Comput..