Non-asymptotic Bandwidth Selection for Density Estimation of Discrete Data

We propose a new method for density estimation of categorical data. The method implements a non-asymptotic data-driven bandwidth selection rule and provides model sparsity not present in the standard kernel density estimation method. Numerical experiments with a well-known ten-dimensional binary medical data set illustrate the effectiveness of the proposed approach for density estimation, discriminant analysis and classification.

[1]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[2]  A. Bowman An alternative method of cross-validation for the smoothing of density estimates , 1984 .

[3]  A. Bowman A comparative study of some kernel-based nonparametric density estimators , 1985 .

[4]  R. Fletcher Practical Methods of Optimization , 1988 .

[5]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[6]  Luc Devroye,et al.  Nonparametric Density Estimation , 1985 .

[7]  L. Devroye,et al.  Nonparametric density estimation : the L[1] view , 1987 .

[8]  H. K. Kesavan,et al.  The generalized maximum entropy principle , 1989, IEEE Trans. Syst. Man Cybern..

[9]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[10]  José Carlos Príncipe,et al.  Advanced search algorithms for information-theoretic learning with kernel-based estimators , 2004, IEEE Transactions on Neural Networks.

[11]  Douglas J. Miller,et al.  Maximum entropy econometrics: robust estimation with limited data , 1996 .

[12]  Jeffrey S. Simonoff,et al.  Smoothing categorical data , 1995 .

[13]  M. Rudemo Empirical Choice of Histograms and Kernel Density Estimators , 1982 .

[14]  P. Hall On nonparametric multivariate binary discrimination , 1981 .

[15]  D. M. Titterington,et al.  A Comparative Study of Kernel-Based Density Estimates for Categorical Data , 1980 .

[16]  Deniz Erdogmus,et al.  Entropy minimization for supervised digital communications channel equalization , 2002, IEEE Trans. Signal Process..

[17]  Dirk P. Kroese,et al.  The Cross-Entropy Method , 2011, Information Science and Statistics.

[18]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[19]  J. N. Kapur Maximum-entropy models in science and engineering , 1992 .

[20]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[21]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[22]  J A Anderson,et al.  A statistical aid to the diagnosis of keratoconjunctivitis sicca. , 1972, The Quarterly journal of medicine.

[23]  R. Rubinstein A Stochastic Minimum Cross-Entropy Method for Combinatorial Optimization and Rare-event Estimation* , 2005 .

[24]  Jagat Narain Kapur,et al.  Measures of information and their applications , 1994 .

[25]  J. Marron,et al.  Progress in data-based bandwidth selection for kernel density estimation , 1996 .

[26]  Robert Malouf Maximum Entropy Models , 2010 .

[27]  C. J. Stone,et al.  An Asymptotically Optimal Window Selection Rule for Kernel Density Estimates , 1984 .

[28]  C. Loader Bandwidth selection: classical or plug-in? , 1999 .

[29]  Jan Havrda,et al.  Quantification method of classification processes. Concept of structural a-entropy , 1967, Kybernetika.

[30]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[31]  J. Simonoff Smoothing Methods in Statistics , 1998 .

[32]  Deniz Erdogmus,et al.  An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems , 2002, IEEE Trans. Signal Process..

[33]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[34]  Zdravko I. Botev,et al.  Stochastic Methods for Optimization and Machine Learning , 2005 .

[35]  J. Aitchison,et al.  Multivariate binary discrimination by the kernel method , 1976 .

[36]  J. N. Kapur,et al.  Entropy optimization principles with applications , 1992 .