Abstract We regard histogram density estimation asa model selection problem. Our approach isbased on the information-theoretic minimumdescription length (MDL) principle. MDL-based model selection is formalized via thenormalized maximum likelihood (NML) dis-tribution, which has several desirable opti-mality properties. We show how this ap-proach can be applied for learning generic, ir-regular (variable-width bin) histograms, andhow to compute the model selection crite-rion efficiently. We also derive a dynamicprogramming algorithm for finding both theNML-optimal bin count and the cut pointlocations in polynomial time. Finally, wedemonstrate our approach via simulationtests. 1 INTRODUCTION Density estimation is one of the central problems instatistical inference and machine learning. Given arandom sample of observations from an unknown den-sity, the goal of histogram density estimation is to finda piecewise constant density that describes the databest according to some pre-determined criterion. Al-though histograms are very simple densities, they arevery flexible and can model complex properties likemulti-modality with a relatively small number of pa-rameters. Furthermore, one does not need to assumeany specific form for the underlying density function:given enough bins, a histogram estimator adapts toany kind of density.Most existing methods for learning histogram densitiesassume that the bin widths are equal and concentrateonly on finding the optimal bin count. These regu-lar histograms are, however, often problematic. It hasbeen argued [15] that regular histograms are only goodfor describing roughly uniform data. If the data dis-tribution is strongly non-uniform, the bin count mustnecessarily be high if one wants to capture the detailsof the high density portion of the data. This in turnmeans that an unnecessary large amount of bins iswasted in the low density region.To avoid the problems of regular histograms one mustallow the bins to be of variable width. For these irreg-ular histograms, it is necessary to find the optimal setof cut points in addition to the number of bins, whichnaturally makes the learning problem essentially moredifficult. For solving this problem, we regard the his-togram density estimation as a model selection task,where the cut point sets are considered as models. Inthisframework, onemustfirstchooseasetofcandidatecut points, from which the optimal model is searchedfor. The quality of the cut point sets is then measuredby some model selection criteria.Our approach is based on information theory, morespecifically on the minimum encoding or minimumcomplexity methods. These methods perform induc-tion by seeking a theory that allows the most compactencoding of both the theory and available data. Intu-itively speaking, this approach can be argued to pro-duce the best possible model of the problem domain,since in order to be able to produce the most efficientcoding, one must capture all the regularities present inthe domain. Consequently, the minimum encoding ap-proach can be used for constructing a solid theoreticalframework for statistical modeling.The most well-founded formalization of the minimumencoding approach is the minimum description length(MDL) principle developed by Rissanen [10, 11, 12].The main idea of this principle is to represent a set ofmodels (model class) by a single model imitating thebehaviour of any model in the class. Such representa-tive models are called universal. The universal modelitself does not have to belong to the model class asoften is the case.
[1]
Jorma Rissanen,et al.
Density estimation by stochastic complexity
,
1992,
IEEE Trans. Inf. Theory.
[2]
Peter Grünwald,et al.
A tutorial introduction to the minimum description length principle
,
2004,
ArXiv.
[3]
P. Kontkanen,et al.
ANALYZING THE STOCHASTIC COMPLEXITY VIA TREE POLYNOMIALS
,
2005
.
[4]
Y. Shtarkov.
AIM FUNCTIONS AND SEQUENTIAL ESTIMATION OF THE SOURCE MODEL FOR UNIVERSAL CODING
,
1999
.
[5]
Jorma Rissanen,et al.
The Minimum Description Length Principle in Coding and Modeling
,
1998,
IEEE Trans. Inf. Theory.
[6]
V. Balasubramanian.
MDL , Bayesian Inference and the Geometry of the Space of Probability Distributions
,
2006
.
[7]
Yves Rozenholc,et al.
How many bins should be put in a regular histogram
,
2006
.
[8]
E. Hannan,et al.
On stochastic complexity and nonparametric density estimation
,
1988
.
[9]
Jorma Rissanen,et al.
Strong optimality of the normalized ML models as universal codes and information in data
,
2001,
IEEE Trans. Inf. Theory.
[10]
J. Rissanen,et al.
Modeling By Shortest Data Description*
,
1978,
Autom..
[11]
T. Speed,et al.
Data compression and histograms
,
1992
.
[12]
Henry Tirri,et al.
On the Behavior of MDL Denoising
,
2005,
AISTATS.
[13]
Gaston H. Gonnet,et al.
On the LambertW function
,
1996,
Adv. Comput. Math..
[14]
D. Knuth,et al.
A recurrence related to trees
,
1989
.
[15]
Jorma Rissanen,et al.
An MDL Framework for Data Clustering
,
2005
.
[16]
Jorma Rissanen,et al.
Lectures on Statistical Modeling Theory
,
2002
.
[17]
Jorma Rissanen,et al.
Fisher information and stochastic complexity
,
1996,
IEEE Trans. Inf. Theory.