A Bayesian approach to bandwidth selection for multivariate kernel density estimation

Kernel density estimation for multivariate data is an important technique that has a wide range of applications. However, it has received significantly less attention than its univariate counterpart. The lower level of interest in multivariate kernel density estimation is mainly due to the increased difficulty in deriving an optimal data-driven bandwidth as the dimension of the data increases. We provide Markov chain Monte Carlo (MCMC) algorithms for estimating optimal bandwidth matrices for multivariate kernel density estimation. Our approach is based on treating the elements of the bandwidth matrix as parameters whose posterior density can be obtained through the likelihood cross-validation criterion. Numerical studies for bivariate data show that the MCMC algorithm generally performs better than the plug-in algorithm under the Kullback-Leibler information criterion, and is as good as the plug-in algorithm under the mean integrated squared error (MISE) criterion. Numerical studies for five-dimensional data show that our algorithm is superior to the normal reference rule. Our MCMC algorithm is the first data-driven bandwidth selector for multivariate kernel density estimation that is applicable to data of any dimension.

[1]  Richard D. Deveaux,et al.  Applied Smoothing Techniques for Data Analysis , 1999, Technometrics.

[2]  Luc Bauwens,et al.  Bayesian Inference on GARCH Models Using the Gibbs Sampler , 1998 .

[3]  M. C. Jones,et al.  A Brief Survey of Bandwidth Selection for Density Estimation , 1996 .

[4]  Rob J Hyndman,et al.  Computing and Graphing Highest Density Regions , 1996 .

[5]  Andrew W. Lo,et al.  Nonparametric estimation of state-price densities implicit in financial asset prices , 1995, Proceedings of 1995 Conference on Computational Intelligence for Financial Engineering (CIFEr).

[6]  S. Keleş,et al.  Statistical Applications in Genetics and Molecular Biology Asymptotic Optimality of Likelihood-Based Cross-Validation , 2011 .

[7]  Ian Abramson On Bandwidth Variation in Kernel Estimates-A Square Root Law , 1982 .

[8]  Maxwell L. King,et al.  Box-Cox stochastic volatility models with heavy-tails and correlated errors , 2008 .

[9]  M. C. Jones,et al.  Comparison of Smoothing Parameterizations in Bivariate Kernel Density Estimation , 1993 .

[10]  Stephen G. Donald INFERENCE CONCERNING THE NUMBER OF FACTORS IN A MULTIVARIATE NONPARAMETRIC RELATIONSHIP , 1997 .

[11]  A. Azzalini,et al.  Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t‐distribution , 2003, 0911.2342.

[12]  Yacine Ait-Sahalia Testing Continuous-Time Models of the Spot Interest Rate , 1995 .

[13]  Yiu Kuen Tse,et al.  Estimation of hyperbolic diffusion using the Markov chain Monte Carlo method , 2004 .

[14]  William F. Eddy Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface , 1981 .

[15]  A. Azzalini,et al.  The multivariate skew-normal distribution , 1996 .

[16]  M. Hazelton,et al.  Plug-in bandwidth matrices for bivariate kernel density estimation , 2003 .

[17]  Richard Stanton A Nonparametric Model of Term Structure Dynamics and the Market Price of Interest Rate Risk , 1997 .

[18]  Sylvia Richardson,et al.  Markov chain concepts related to sampling algorithms , 1995 .

[19]  M. Wand,et al.  Multivariate plug-in bandwidth selection , 1994 .

[20]  E. F. Schuster,et al.  On the Nonconsistency of Maximum Likelihood Nonparametric Density Estimators , 1981 .

[21]  Y. Tse,et al.  Estimation of hyperbolic diffusion using the Markov chain Monte Carlo method , 2004 .

[22]  A. Izenman Recent Developments in Nonparametric Density Estimation , 1991 .

[23]  Mark J. Brewer,et al.  A Bayesian model for local smoothing in kernel density estimation , 2000, Stat. Comput..

[24]  Chris Jones,et al.  A skew t distribution , 2000 .

[25]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[26]  J. Simonoff Smoothing Methods in Statistics , 1998 .

[27]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[28]  M. C. Jones,et al.  A skew extension of the t‐distribution, with applications , 2003 .

[29]  P. Valpine Monte Carlo State-Space Likelihoods by Weighted Posterior Kernel Density Estimation , 2004 .

[30]  Marshall W. Bern,et al.  A new Voronoi-based surface reconstruction algorithm , 1998, SIGGRAPH.

[31]  A. Azzalini,et al.  Statistical applications of the multivariate skew normal distribution , 2009, 0911.2093.

[32]  A. Lo,et al.  Nonparametric Estimation of State‐Price Densities Implicit in Financial Asset Prices , 1998 .

[33]  A. Izenman Review Papers: Recent Developments in Nonparametric Density Estimation , 1991 .

[34]  James Stephen Marron,et al.  A Comparison of Cross-Validation Techniques in Density Estimation , 1987 .

[35]  D. W. Scott,et al.  On Locally Adaptive Density Estimation , 1996 .

[36]  W. Härdle Smoothing Techniques: With Implementation in S , 1991 .

[37]  D. W. Scott,et al.  Cross-Validation of Multivariate Densities , 1994 .

[38]  Peter E. Rossi,et al.  Bayesian analysis of stochastic volatility models with fat-tails and correlated errors , 2004 .

[39]  W. W. Daniel Applied Nonparametric Statistics , 1979 .

[40]  N. Shephard,et al.  Stochastic Volatility: Likelihood Inference And Comparison With Arch Models , 1996 .

[41]  G. Kitagawa Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .