A Cross-Validation Bandwidth Choice for Kernel Density Estimates with Selection Biased Data

This paper studies the risks and bandwidth choices of a kernel estimate of the underlying density when the data are obtained fromsindependent biased samples. The main results of this paper give the asymptotic representation of the integrated squared errors and the mean integrated squared errors of the estimate and establish a cross-validation criterion for bandwidth selection. This kernel density estimate is shown to be asymptotically superior to many other intuitive kernel density estimates. The data-driven cross-validation bandwidth is shown to be asymptotically optimal in the sense of Stone (1984,Ann. Statist.12, 1285?1297). The finite sample properties of the cross-validation bandwidth are investigated through a Monte Carlo simulation.

[1]  M. C. Jones Kernel density estimation for length biased data , 1991 .

[2]  L. Devroye,et al.  Nonparametric Density Estimation: The L 1 View. , 1985 .

[3]  Ganapati P. Patil,et al.  PROBING ENCOUNTERED DATA, META ANALYSIS AND WEIGHTED DISTRIBUTION METHODS , 1989 .

[4]  M. Rudemo Empirical Choice of Histograms and Kernel Density Estimators , 1982 .

[5]  Colin O. Wu,et al.  Minimax kernels for density estimation with biased data , 1996 .

[6]  C. J. Stone,et al.  An Asymptotically Optimal Window Selection Rule for Kernel Density Estimates , 1984 .

[7]  Y. Vardi Empirical Distributions in Selection Bias Models , 1985 .

[8]  N. Jewell,et al.  Regression analysis based on stratified samples , 1986 .

[9]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[10]  Stephan Morgenthaler,et al.  Choice-based samples: A non-parametric approach , 1986 .

[11]  M. C. Jones,et al.  On optimal data-based bandwidth selection in kernel density estimation , 1991 .

[12]  W. Härdle,et al.  Optimal Bandwidth Selection in Nonparametric Regression Function Estimation , 1985 .

[13]  Nicholas P. Jewell,et al.  Least squares regression with data arising from stratified samples of the dependent variable , 1985 .

[14]  P. Hall Large Sample Optimality of Least Squares Cross-Validation in Density Estimation , 1983 .

[15]  Jianqing Fan,et al.  On curve estimation by minimizing mean absolute deviation and its implications , 1994 .

[16]  C. J. Stone,et al.  Optimal Rates of Convergence for Nonparametric Estimators , 1980 .

[17]  James Stephen Marron,et al.  Random approximations to some measures of accuracy in nonparametric curve estimation , 1986 .

[18]  On Good Deterministic Smoothing Sequences for Kernel Density Estimates , 1994 .

[19]  N. L. Johnson,et al.  New developments in survey sampling , 1970 .

[20]  Ibrahim A. Ahmad,et al.  On multivariate kernel estimation for samples from weighted distributions , 1995 .

[21]  J. Marron An Asymptotically Efficient Solution to the Bandwidth Problem of Kernel Density Estimation , 1985 .

[22]  Yadolah Dodge,et al.  Statistical data analysis and inference , 1992 .

[23]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[24]  James Stephen Marron,et al.  A Comparison of Cross-Validation Techniques in Density Estimation , 1987 .

[25]  Richard D. Gill,et al.  Large sample theory of empirical distributions in biased sampling models , 1988 .

[26]  James Stephen Marron,et al.  Best Possible Constant for Bandwidth Selection , 1992 .

[27]  A. Bowman An alternative method of cross-validation for the smoothing of density estimates , 1984 .