Progress in data-based bandwidth selection for kernel density estimation

We review the extensive recent literature on automatic, data-based selection of a global smoothing parameter in univariate kernel density estimation. Proposals are presented in a unified framework, making considerable reference to their theoretical properties as we go. The results of a major simulation study of the practical performance of many of these methods are summarised. Also, our remarks are further consolidated by describing a small portion of our practical experience on real datasets. Our comparison of methods' practical performance demonstrates that improvements to be gained by using the better methods can be, and often are, considerable. It will be seen that achieving optimal theoretical performance (up to bounds derived by Hall and Marron, 1991) and acceptable practical performance is not accomplished by the same techniques. We put much effort into ~aking good practical choices whenever options arise. We emphasise that arguably the two best known bandwidth selection methods cannot be advocated for general practical use; these are "least squares cross-validation" (which suffers from too much variability) and normal-based "rules-of-thumb" (which are too biased towards oversmoothing). A number of methods that do seem to be 'worthy of further consideration are listed. We show why our o"erall current preference is for the method of Sheather and Jones (1991), It is hoped that the lessons learned in this comparatively simple setting will also prove useful in many other smoothing situations.

[1]  M. Woodroofe On Choosing a Delta-Sequence , 1970 .

[2]  D. W. Scott,et al.  Kernel density estimation revisited , 1977 .

[3]  P. Deheuvels Estimation non paramétrique de la densité par histogrammes généralisés , 1977 .

[4]  S. Weisberg Applied Linear Regression , 1981 .

[5]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[6]  M. Rudemo Empirical Choice of Histograms and Kernel Density Estimators , 1982 .

[7]  Ian Abramson On Bandwidth Variation in Kernel Estimates-A Square Root Law , 1982 .

[8]  S. Sheather A data-based algorithm for choosing the window width when estimating the density at a point , 1983 .

[9]  A. Bowman An alternative method of cross-validation for the smoothing of density estimates , 1984 .

[10]  D. W. Scott,et al.  Oversmoothed Nonparametric Density Estimates , 1985 .

[11]  Luc Devroye,et al.  Nonparametric Density Estimation , 1985 .

[12]  A. Bowman A comparative study of some kernel-based nonparametric density estimators , 1985 .

[13]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[14]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[15]  J. Marron,et al.  Extent to which least-squares cross-validation minimises integrated square error in nonparametric density estimation , 1987 .

[16]  James Stephen Marron,et al.  On the Amount of Noise Inherent in Bandwidth Selection for a Kernel Density Estimator , 1987 .

[17]  D. W. Scott,et al.  Biased and Unbiased Cross-Validation in Density Estimation , 1987 .

[18]  L. Devroye,et al.  Nonparametric density estimation : the L[1] view , 1987 .

[19]  P. Hall On Kullback-Leibler loss and density estimation , 1987 .

[20]  D. Donoho One-sided inference about functionals of a density , 1988 .

[21]  A. Izenman,et al.  Philatelic Mixtures and Multimodal Densities , 1988 .

[22]  J. Marron Automatic smoothing parameter selection: A survey , 1988 .

[23]  James Stephen Marron,et al.  Comparison of data-driven bandwith selectors , 1988 .

[24]  W. Härdle,et al.  How Far are Automatically Chosen Regression Smoothing Parameters from their Optimum , 1988 .

[25]  Matt P. Wand,et al.  Minimizing L 1 distance in nonparametric density estimation , 1988 .

[26]  Matt P. Wand,et al.  On the minimization of absolute distance in kernel density estimation , 1988 .

[27]  Thomas J. DiCiccio,et al.  On Smoothing and the Bootstrap , 1989 .

[28]  Charles C. Taylor,et al.  Bootstrap choice of the smoothing parameter in kernel density estimation , 1989 .

[29]  L. Devroye The double kernel method in density estimation , 1989 .

[30]  Wolfgang Härdle,et al.  Nonparametric Curve Estimation from Time Series , 1989 .

[31]  Shean-Tsong Chiu,et al.  On the asymptotic distributions of bandwidth estimates , 1990 .

[32]  G. A. Young Alternative smoothed bootstraps , 1990 .

[33]  J. Stephen Marron,et al.  Bootstrap bandwidth selection , 1990 .

[34]  G. Terrell The Maximal Smoothing Principle in Density Estimation , 1990 .

[35]  J. Faraway,et al.  Bootstrap choice of bandwidth for density estimation , 1990 .

[36]  James Stephen Marron,et al.  Local minima in cross validation functions , 1991 .

[37]  Shean-Tsong Chiu,et al.  Bandwidth selection for kernel density estimation , 1991 .

[38]  M. C. Jones Kernel density estimation for length biased data , 1991 .

[39]  Shean-Tsong Chiu,et al.  Some stabilized bandwidth selectors for nonparametric regression , 1991 .

[40]  James Stephen Marron,et al.  Transformations in Density Estimation , 1991 .

[41]  M. C. Jones The roles of ISE and MISE in density estimation , 1991 .

[42]  Root N Bandwidth Selection , 1991 .

[43]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[44]  M. C. Jones,et al.  On optimal data-based bandwidth selection in kernel density estimation , 1991 .

[45]  Estimation of integrated squared spectral density derivatives , 1991 .

[46]  M. C. Jones,et al.  On a class of kernel density estimate bandwidth selectors , 1991 .

[47]  Simon J. Sheather,et al.  Using non stochastic terms to advantage in kernel-based estimation of integrated squared density derivatives , 1991 .

[48]  Aplicaciones y nuevos resultados del método bootstrap en la estimación no paramétrica de curvas , 1991 .

[49]  James Stephen Marron,et al.  A simple root n bandwidth selector , 1991 .

[50]  T. Gasser,et al.  A Flexible and Fast Method for Automatic Smoothing , 1991 .

[51]  James Stephen Marron,et al.  Lower bounds for bandwidth selection in density estimation , 1991 .

[52]  Shean-Tsong Chiu,et al.  The effect of discretization error on bandwidth selection for kernel density estimation , 1991 .

[53]  Brian Kent Aldershof,et al.  Estimation of integrated squared density derivatives , 1991 .

[54]  Winfried Stute Modified cross-validation in density estimation , 1992 .

[55]  Shean-Tsong Chiu An automatic bandwidth selector for kernel density estimation , 1992 .

[56]  Bootstrap optimal bandwidth selection for kernel density estimates , 1992 .

[57]  L. Goldstein,et al.  Optimal Plug-in Estimators for Nonparametric Functional Estimation , 1992 .

[58]  James Stephen Marron,et al.  Best Possible Constant for Bandwidth Selection , 1992 .

[59]  Bert van Es Asymptotics for Least Squares Cross-Validation Bandwidths in Nonsmooth Cases , 1992 .

[60]  James Stephen Marron,et al.  Regression smoothing parameters that are not far from their optimum , 1992 .

[61]  Potential for automatic bandwidth choice in variations on kernel density estimation , 1992 .

[62]  James Stephen Marron,et al.  On the use of pilot estimators in bandwidth selection , 1992 .

[63]  J. Marron,et al.  Smoothed cross-validation , 1992 .

[64]  I. Johnstone,et al.  Empirical functionals and e cient smoothing parameter selection , 1992 .

[65]  B. Turlach,et al.  Rejoinder to ``Practical performance of several data driven bandwidth selectors" , 1992 .

[66]  M. Wand,et al.  EXACT MEAN INTEGRATED SQUARED ERROR , 1992 .

[67]  Simon J. Sheather,et al.  Local Bandwidth Selection for Density Estimation , 1992 .

[68]  A. Cuevas,et al.  A comparative study of several smoothing methods in density estimation , 1994 .

[69]  James Stephen Marron,et al.  Asymptotically best bandwidth selectors in kernel density estimation , 1994 .

[70]  Joachim Engel,et al.  An iterative bandwidth selector for kernel estimation of densities and their derivatives , 1994 .

[71]  Scale measures for bandwidth selection , 1995 .

[72]  W. R. Schucany,et al.  Adaptive Bandwidth Choice for Kernel Regression , 1995 .