Self-tuning density estimation based on Bayesian averaging of adaptive kernel density estimations yields state-of-the-art performance

Abstract Non-parametric probability density function (pdf) estimation is a general problem encountered in many fields. A promising alternative to the dominating solutions, kernel density estimation (KDE) and Gaussian mixture modeling, is adaptive KDE where kernels are given individual bandwidths adjusted to the local data density. Traditionally the bandwidths are selected by a non-linear transformation of a pilot pdf estimate, containing parameters controlling the scaling, but identifying parameters values yielding competitive performance has turned out to be non-trivial. We present a new self-tuning (parameter free) pdf estimation method called adaptive density estimation by Bayesian averaging (ADEBA) that approximates pdf estimates in the form of weighted model averages across all possible parameter values, weighted by their Bayesian posterior calculated from the data. ADEBA is shown to be simple, robust, competitive in comparison to the current practice, and easily generalize to multivariate distributions. An implementation of the method for R is publicly available.

[1]  Larry S. Davis,et al.  A non-parametric approach to extending generic binary classifiers for multi-classification , 2016, Pattern Recognit..

[2]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[3]  Georges Kariniotakis,et al.  Probabilistic short-term wind power forecasting based on kernel density estimators , 2007 .

[4]  Ching-Fu Chen,et al.  A variable bandwidth selector in multivariate kernel density estimation , 2007 .

[5]  Ferdinand van der Heijden,et al.  Efficient adaptive density estimation per image pixel for the task of background subtraction , 2006, Pattern Recognit. Lett..

[6]  M. Wand,et al.  EXACT MEAN INTEGRATED SQUARED ERROR , 1992 .

[7]  T. Duong,et al.  Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting , 2012, 1204.6160.

[8]  D. W. Scott,et al.  On Locally Adaptive Density Estimation , 1996 .

[9]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[10]  Yasha Zeinali,et al.  Competitive probabilistic neural network , 2017, Integr. Comput. Aided Eng..

[11]  D. Hunter,et al.  mixtools: An R Package for Analyzing Mixture Models , 2009 .

[12]  Robert P. W. Duin,et al.  On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions , 1976, IEEE Transactions on Computers.

[13]  Zheng Lin,et al.  Learning Entity and Relation Embeddings for Knowledge Resolution , 2017, ICCS.

[14]  Smail Adjabi,et al.  Bayesian estimation of adaptive bandwidth matrices in multivariate kernel density estimation , 2014, Comput. Stat. Data Anal..

[15]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[16]  Pedro Larrañaga,et al.  Bayesian classifiers based on kernel density estimation: Flexible classifiers , 2009, Int. J. Approx. Reason..

[17]  Sreeram Kannan,et al.  Estimating Mutual Information for Discrete-Continuous Mixtures , 2017, NIPS.

[18]  Vladimir Katkovnik,et al.  Kernel density estimation with adaptive varying window size , 2002, Pattern Recognit. Lett..

[19]  Esley Torres,et al.  Edge Detection based on Kernel Density Estimation , 2014, ArXiv.

[21]  Henry Horng-Shing Lu,et al.  Segmentation of cDNA microarray images by kernel density estimation , 2008, J. Biomed. Informatics.

[22]  Shuyuan Yang,et al.  Global discriminative-based nonnegative spectral clustering , 2016, Pattern Recognit..

[23]  George S. Sebestyen,et al.  Pattern recognition by an adaptive process of sample set construction , 1962, IRE Trans. Inf. Theory.

[24]  Ming-Syan Chen,et al.  On the Design and Applicability of Distance Functions in High-Dimensional Data Space , 2009, IEEE Trans. Knowl. Data Eng..

[25]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[26]  Peter Hall,et al.  Cross-validation in density estimation , 1982 .

[27]  Xiangyu Li,et al.  Learning arbitrary-shape object detector from bounding-box annotation by searching region-graph , 2017, Pattern Recognit. Lett..

[28]  Ronaldo Dias,et al.  A Review of Kernel Density Estimation with Applications to Econometrics , 2012, 1212.2812.

[29]  Jianguo Jiang,et al.  Automatic image annotation by semi-supervised manifold kernel density estimation , 2014, Inf. Sci..

[30]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[31]  Charu C. Aggarwal,et al.  Re-designing distance functions and distance-based applications for high dimensional data , 2001, SGMD.

[32]  Ian Abramson On Bandwidth Variation in Kernel Estimates-A Square Root Law , 1982 .

[33]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[34]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[35]  M. C. Jones,et al.  E. Fix and J.L. Hodges (1951): An Important Contribution to Nonparametric Discriminant Analysis and Density Estimation: Commentary on Fix and Hodges (1951) , 1989 .

[36]  Inderjit S. Dhillon,et al.  Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[37]  L. Breiman,et al.  Variable Kernel Estimates of Multivariate Densities , 1977 .

[38]  G. S. Atuncar,et al.  A Bayesian method to estimate the optimal bandwidth for multivariate kernel estimator , 2011 .

[39]  Dirk P. Kroese,et al.  Kernel density estimation via diffusion , 2010, 1011.2602.

[40]  Simone Palazzo,et al.  A texton-based kernel density estimation approach for background modeling under extreme conditions , 2014, Comput. Vis. Image Underst..

[41]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[42]  Dario Cazzato,et al.  Randomized circle detection with isophotes curvature analysis , 2015, Pattern Recognit..

[43]  Geoffrey J. McLachlan,et al.  On the number of components in a Gaussian mixture model , 2014, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[44]  Narciso García,et al.  Real-time nonparametric background subtraction with tracking-based foreground update , 2018, Pattern Recognit..

[45]  C. C. Kokonendji,et al.  A Bayesian Approach to Bandwidth Selection in Univariate Associate Kernel Estimation , 2013 .

[46]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[47]  Mark J. Brewer,et al.  A Bayesian model for local smoothing in kernel density estimation , 2000, Stat. Comput..

[48]  J. G. Liao,et al.  Improving Sheather and Jones’ bandwidth selector for difficult densities in kernel density estimation , 2010 .

[49]  Shuowen Hu,et al.  Bayesian adaptive bandwidth kernel density estimation of irregular multivariate distributions , 2012, Comput. Stat. Data Anal..

[50]  Rob J. Hyndman,et al.  A Bayesian approach to bandwidth selection for multivariate kernel density estimation , 2006, Comput. Stat. Data Anal..

[51]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[52]  Gene H. Golub,et al.  Matrix computations , 1983 .

[53]  Volker Tresp,et al.  Improved Gaussian Mixture Density Estimates Using Bayesian Penalty Terms and Network Averaging , 1995, NIPS.

[54]  D. W. Scott,et al.  Variable Kernel Density Estimation , 1992 .