The Generalized Cross Entropy Method, with Applications to Probability Density Estimation

Nonparametric density estimation aims to determine the sparsest model that explains a given set of empirical data and which uses as few assumptions as possible. Many of the currently existing methods do not provide a sparse solution to the problem and rely on asymptotic approximations. In this paper we describe a framework for density estimation which uses information-theoretic measures of model complexity with the aim of constructing a sparse density estimator that does not rely on large sample approximations. The effectiveness of the approach is demonstrated through an application to some well-known density estimation test cases.

[1]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[2]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[3]  Jan Havrda,et al.  Quantification method of classification processes. Concept of structural a-entropy , 1967, Kybernetika.

[4]  I. Csiszár A class of measures of informativity of observation channels , 1972 .

[5]  Ian Abramson On Bandwidth Variation in Kernel Estimates-A Square Root Law , 1982 .

[6]  J. Hammersley SIMULATION AND THE MONTE CARLO METHOD , 1982 .

[7]  D. M. Titterington,et al.  Cross-validation in nonparametric estimation of probabilities and probability densities , 1984 .

[8]  C. J. Stone,et al.  An Asymptotically Optimal Window Selection Rule for Kernel Density Estimates , 1984 .

[9]  Luc Devroye,et al.  Nonparametric Density Estimation , 1985 .

[10]  L. Devroye,et al.  Nonparametric Density Estimation: The L 1 View. , 1985 .

[11]  A. Bowman A comparative study of some kernel-based nonparametric density estimators , 1985 .

[12]  J. Marron An Asymptotically Efficient Solution to the Bandwidth Problem of Kernel Density Estimation , 1985 .

[13]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[14]  Marc Teboulle,et al.  Penalty Functions and Duality in Stochastic Programming Via ϕ-Divergence Functionals , 1987, Math. Oper. Res..

[15]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[16]  L. Devroye,et al.  Nonparametric density estimation : the L[1] view , 1987 .

[17]  Hiremaglur K. Kesavan,et al.  The generalized maximum entropy principle (with applications) , 1987 .

[18]  P. Hall On Kullback-Leibler loss and density estimation , 1987 .

[19]  A. Izenman,et al.  Philatelic Mixtures and Multimodal Densities , 1988 .

[20]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[21]  J. N. Kapur Maximum-entropy models in science and engineering , 1992 .

[22]  H. K. Kesavan,et al.  The generalized maximum entropy principle , 1989, IEEE Trans. Syst. Man Cybern..

[23]  E. Lehmann Model Specification: The Views of Fisher and Neyman, and Later Developments , 1990 .

[24]  K. Roeder Density estimation with confidence sets exemplified by superclusters and voids in the galaxies , 1990 .

[25]  Shean-Tsong Chiu,et al.  Bandwidth selection for kernel density estimation , 1991 .

[26]  James Stephen Marron,et al.  A simple root n bandwidth selector , 1991 .

[27]  J. Borwein,et al.  Duality relationships for entropy-like minimization problems , 1991 .

[28]  J. N. Kapur,et al.  Entropy Optimization Principles and Their Applications , 1992 .

[29]  Claude Lemaréchal,et al.  Dual Methods in Entropy Maximization. Application to Some Problems in Crystallography , 1992, SIAM J. Optim..

[30]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[31]  J. N. Kapur,et al.  Entropy optimization principles with applications , 1992 .

[32]  D. W. Scott,et al.  Variable Kernel Density Estimation , 1992 .

[33]  M. Wand,et al.  EXACT MEAN INTEGRATED SQUARED ERROR , 1992 .

[34]  David Ruppert,et al.  Bias reduction in kernel density estimation by smoothed empirical transformations , 1994 .

[35]  F. Wan Introduction To The Calculus of Variations And Its Applications , 1994 .

[36]  Jagat Narain Kapur,et al.  Measures of information and their applications , 1994 .

[37]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[38]  S. Chib Marginal Likelihood from the Gibbs Output , 1995 .

[39]  Ping Zhang Nonparametric Importance Sampling , 1996 .

[40]  G. Celeux,et al.  An entropy criterion for assessing the number of clusters in a mixture model , 1996 .

[41]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[42]  J. Simonoff Multivariate Density Estimation , 1996 .

[43]  J. Marron,et al.  Progress in data-based bandwidth selection for kernel density estimation , 1996 .

[44]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[45]  G. McLachlan,et al.  Modelling the distribution of stamp paper thickness via finite normal mixtures: The 1872 Hidalgo stamp issue of Mexico revisited , 1997 .

[46]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[47]  G. Celeux,et al.  Assessing a Mixture Model for Clustering with the Integrated Classification Likelihood , 1998 .

[48]  J. Simonoff Smoothing Methods in Statistics , 1998 .

[49]  Vladimir Vapnik,et al.  Multivariate Density Estimation: a Support Vector Machine Approach , 1999 .

[50]  B. Turlach,et al.  Reducing bias in curve estimation by use of weights , 1999 .

[51]  C. Loader Bandwidth selection: classical or plug-in? , 1999 .

[52]  H. K. Kesavan,et al.  Probability density function estimation using the MinMax measure , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[53]  Guohua Pan,et al.  Local Regression and Likelihood , 1999, Technometrics.

[54]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[55]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[56]  David W. Scott,et al.  Parametric Statistical Modeling by Minimum Integrated Square Error , 2001, Technometrics.

[57]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[58]  M. Wand Local Regression and Likelihood , 2001 .

[59]  Deniz Erdogmus,et al.  An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems , 2002, IEEE Trans. Signal Process..

[60]  Roger Sauter,et al.  In All Likelihood , 2002, Technometrics.

[61]  Chao He,et al.  Probability Density Estimation from Optimally Condensed Data Samples , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  Chao He,et al.  Novelty detection employing an L2 optimal non-parametric density estimator , 2004, Pattern Recognit. Lett..

[63]  José Carlos Príncipe,et al.  Advanced search algorithms for information-theoretic learning with kernel-based estimators , 2004, IEEE Transactions on Neural Networks.

[64]  Dirk P. Kroese,et al.  Cross‐Entropy Method , 2011 .

[65]  P. Rousseeuw,et al.  Wiley Series in Probability and Mathematical Statistics , 2005 .

[66]  Zdravko I. Botev,et al.  Stochastic Methods for Optimization and Machine Learning , 2005 .

[67]  R. Rubinstein A Stochastic Minimum Cross-Entropy Method for Combinatorial Optimization and Rare-event Estimation* , 2005 .

[68]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[69]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[70]  Dirk P. Kroese,et al.  Non-asymptotic Bandwidth Selection for Density Estimation of Discrete Data , 2008 .