Combining regular and irregular histograms by penalized likelihood

A new fully automatic procedure for the construction of histograms is proposed. It consists of constructing both a regular and an irregular histogram and then choosing between the two. To choose the number of bins in the irregular histogram, two different penalties motivated by recent work in model selection are proposed. A description of the algorithm and a proper tuning of the penalties is given. Finally, different versions of the procedure are compared to other existing proposals for a wide range of densities and sample sizes. In the simulations, the squared Hellinger risk of the new procedure is always at most twice as large as the risk of the best of the other methods. The procedure is implemented in the R-Package histogram available from CRAN.

[1]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[2]  G. Lugosi,et al.  Bin width selection in multivariate histograms by the combinatorial method , 2004 .

[3]  P. Davies,et al.  Densities, spectral densities and modality , 2004, math/0410071.

[4]  Jussi Klemelä Density estimation with stagewise optimization of the empirical risk , 2006, Machine Learning.

[5]  Yuichiro Kanazawa An Optimal Variable Cell Histogram Based on the Sample Spacings , 1992 .

[6]  Atsuyuki Kogure,et al.  Asymptotically Optimal Cells for a Historgram , 1987 .

[7]  Yuichiro Kanazawa An optimal variable cell histogram , 1988 .

[8]  G. Lugosi,et al.  Consistency of Data-driven Histogram Methods for Density Estimation and Classification , 1996 .

[9]  Jussi Klemela MULTIVARIATE HISTOGRAMS WITH DATA-DEPENDENT PARTITIONS , 2009 .

[10]  Yves Rozenholc,et al.  A new algorithm for fixed design regression and denoising , 2004 .

[11]  L. Zhao,et al.  Almost Sure $L_r$-Norm Convergence for Data-Based Histogram Density Estimates , 1991 .

[12]  G. Castellan Sélection d'histogrammes à l'aide d'un critère de type Akaike , 2000 .

[13]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[14]  Ursula Gather,et al.  A COMPARISON OF AUTOMATIC HISTOGRAM CONSTRUCTIONS , 2009 .

[15]  J. Engel The multiresolution histogram , 1997 .

[16]  Klaus-Robert Müller,et al.  Optimal dyadic decision trees , 2007, Machine Learning.

[17]  Jorma Rissanen,et al.  Density estimation by stochastic complexity , 1992, IEEE Trans. Inf. Theory.

[18]  Olivier Catoni,et al.  DATA COMPRESSION AND ADAPTIVE HISTOGRAMS , 2002 .

[19]  Petri Myllymäki,et al.  MDL Histogram Density Estimation , 2007, AISTATS.

[20]  Stéphane Robin,et al.  Nonparametric density estimation by exact leave-p-out cross-validation , 2008, Comput. Stat. Data Anal..

[21]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[22]  L. Zhao,et al.  Almost sure L 1 -norm convergence for data-based histogram density estimates , 1987 .

[23]  L. Devroye,et al.  Nonparametric density estimation : the L[1] view , 1987 .

[24]  Yves Rozenholc,et al.  How many bins should be put in a regular histogram , 2006 .

[25]  H. Akaike A new look at the statistical model identification , 1974 .