Bandwidth selection for kernel distribution function estimation

Abstract Leave-one-out cross-validation is a popular and readily implemented heuristic for bandwidth selection in nonparametric smoothing problems. In this note we elucidate the role of leave-one-out selection criteria by discussing a criterion introduced by Sarda (J. Statist. Plann. Inference 35 (1993) 65–75) for bandwidth selection for kernel distribution function estimators (KDFEs). We show that for this problem, use of the leave-one-out KDFE in the selection procedure is asymptotically equivalent to leaving none out. This contrasts with kernel density estimation, where use of the leave-one-out density estimator in the selection procedure is critical. Unfortunately, simulations show that neither method works in practice, even for samples of size as large as 1000. In fact, we show that for any fixed bandwidth, the expected value of the derivative of the leave-none-out criterion is asymptotically positive. This result and our simulations suggest that the criteria are increasing and that for sufficiently large samples (e.g., n = 100), the smallest available bandwidth will always be selected, thus contradicting the optimality result of Sarda for this estimator. As an alternative to minimizing a selection criterion, we propose a plug-in estimator of the asymptotically optimal bandwidth. Simulations suggest that the plug-in is a good estimator of the asymptotically optimal bandwidth even for samples as small as 10 observations and is not too far from the finite sample bandwidth.

[1]  James Stephen Marron,et al.  Estimation of integrated squared density derivatives , 1987 .

[2]  Brian Kent Aldershof,et al.  Estimation of integrated squared density derivatives , 1991 .

[3]  James Stephen Marron,et al.  A simple root n bandwidth selector , 1991 .

[4]  Ulrich Stadtmüller,et al.  Bandwidth choice and confidence intervals for derivatives of noisy data , 1987 .

[5]  M. Rudemo Empirical Choice of Histograms and Kernel Density Estimators , 1982 .

[6]  T. Gasser,et al.  A Flexible and Fast Method for Automatic Smoothing , 1991 .

[7]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[8]  M. Woodroofe On Choosing a Delta-Sequence , 1970 .

[9]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[10]  P. Sarda Smoothing parameter selection for smooth distribution functions , 1993 .

[11]  W. Härdle,et al.  Optimal Bandwidth Selection in Nonparametric Regression Function Estimation , 1985 .

[12]  J. Wellner,et al.  Empirical Processes with Applications to Statistics , 2009 .

[13]  E. Nadaraya On Estimating Regression , 1964 .

[14]  C. J. Stone,et al.  An Asymptotically Optimal Window Selection Rule for Kernel Density Estimates , 1984 .

[15]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[16]  M. Wand,et al.  EXACT MEAN INTEGRATED SQUARED ERROR , 1992 .

[17]  A. Bowman An alternative method of cross-validation for the smoothing of density estimates , 1984 .

[18]  Naomi S. Altman,et al.  Cross-validation, the Bootstrap, and Related Methods for Tuning Parameter Selection , 1994 .

[19]  G. Wahba,et al.  A completely automatic french curve: fitting spline functions by cross validation , 1975 .

[20]  James Stephen Marron,et al.  Comparison of data-driven bandwith selectors , 1988 .

[21]  R. M. Clark A calibration curve for radiocarbon dates , 1975, Antiquity.