Improving Sheather and Jones’ bandwidth selector for difficult densities in kernel density estimation

Kernel density estimation is a widely used statistical tool and bandwidth selection is critically important. The Sheather and Jones’ (SJ) selector [A reliable data-based bandwidth selection method for kernel density estimation, J. R. Stat. Soc. Ser. B 53 (1991), pp. 683–690] remains the best available data-driven bandwidth selector. It can, however, perform poorly if the true density deviates too much in shape from normal. This paper first develops an alternative selector by following ideas in Park and Marron [On the use of pilot estimators in bandwidth selection, Nonparametr. Stat. 1 (1992), pp. 231–240] to reduce the impact of the normal reference density. The selector can bring drastic improvement to less smooth densities that the SJ selector has difficulty with, but may be slightly worse off otherwise. We then propose to combine the alternative selector and SJ selector by using the smaller of the two bandwidths, which has the effect of automatically picking the better one for a particular density. In our extensive simulation, study using the 15 benchmark densities in Marron and Wand [Exact mean integrated squared error, Ann. Statist. 20 (1992), pp. 712–736], the combined selector significantly improves the SJ selector for 5 difficult densities and retains the superior performance of the SJ selector for the other 10. A ready-to-use R function is provided.

[1]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[2]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[3]  G. Terrell The Maximal Smoothing Principle in Density Estimation , 1990 .

[4]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[5]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[6]  James Stephen Marron,et al.  On the use of pilot estimators in bandwidth selection , 1992 .

[7]  Ramani Duraiswami,et al.  Fast optimal bandwidth selection for kernel density estimation , 2006, SDM.

[8]  M. Wand,et al.  EXACT MEAN INTEGRATED SQUARED ERROR , 1992 .

[9]  M. C. Jones,et al.  A Brief Survey of Bandwidth Selection for Density Estimation , 1996 .

[10]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[11]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[12]  M. C. Jones,et al.  On optimal data-based bandwidth selection in kernel density estimation , 1991 .

[13]  S. Sheather Density Estimation , 2004 .

[14]  D. W. Scott,et al.  Oversmoothed Nonparametric Density Estimates , 1985 .

[15]  Brian D. Ripley,et al.  Modern applied statistics with S, 4th Edition , 2002, Statistics and computing.

[16]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[17]  M. C. Jones,et al.  Universal smoothing factor selection in density estimation: theory and practice , 1997 .

[18]  James Stephen Marron,et al.  Comparison of data-driven bandwith selectors , 1988 .

[19]  Simon J. Sheather,et al.  Using non stochastic terms to advantage in kernel-based estimation of integrated squared density derivatives , 1991 .

[20]  D. W. Scott,et al.  Biased and Unbiased Cross-Validation in Density Estimation , 1987 .

[21]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[22]  Anne Lohrli Chapman and Hall , 1985 .