Nonasymptotic quasi-optimality of AIC and the slope heuristics in maximum likelihood estimation of density using histogram models

We consider nonparametric maximum likelihood estimation of density using linear histogram models. More precisely, we investigate optimality of model selection procedures via penalization, when the number of models is polynomial in the number of data. It turns out that the Slope Heuristics …rst formulated by Birge and Massart [10] is satis…ed under rather mild conditions on the density to be estimated and the structure of the considered partitions, and that the minimal penalty is equivalent to half of AIC penalty.

[1]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[2]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[3]  H. Akaike A new look at the statistical model identification , 1974 .

[4]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[5]  C. J. Stone Uniform Error Bounds Involving Logspline Models , 1989 .

[6]  A. Barron,et al.  APPROXIMATION OF DENSITY FUNCTIONS BY SEQUENCES OF EXPONENTIAL FAMILIES , 1991 .

[7]  M. Talagrand,et al.  Probability in Banach spaces , 1991 .

[8]  C. Mallows More comments on C p , 1995 .

[9]  M. Ledoux On Talagrand's deviation inequalities for product measures , 1997 .

[10]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[11]  A. Barron Limits of information, Markov chains, and projection , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[12]  Prabir Burman Estimation of equifrequency histograms , 2002 .

[13]  O. Bousquet A Bennett concentration inequality and its application to suprema of empirical processes , 2002 .

[14]  Thierry Klein Une inégalité de concentration à gauche pour les processus empiriques , 2002 .

[15]  Gwénaelle Castellan Density estimation via exponential model selection , 2003, IEEE Trans. Inf. Theory.

[16]  Imre Csiszár,et al.  Information projections revisited , 2000, IEEE Trans. Inf. Theory.

[17]  E. Rio,et al.  Concentration around the mean for maxima of empirical processes , 2005, math/0506594.

[18]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[19]  P. Massart,et al.  Minimal Penalties for Gaussian Model Selection , 2007 .

[20]  Sylvain Arlot TECHNICAL APPENDIX TO "V -FOLD CROSS-VALIDATION IMPROVED: V -FOLD PENALIZATION , 2008, 0802.0566.

[21]  Charles J. Stone,et al.  AN ASYMPTOTICALLY OPTIMAL HISTOGRAM SELECTION RULE , 2008 .

[22]  Sylvain Arlot Model selection by resampling penalization , 2007, 0906.3124.

[23]  Pascal Massart,et al.  Data-driven Calibration of Penalties for Least-Squares Regression , 2008, J. Mach. Learn. Res..

[24]  Adrien Saumard The Slope Heuristics in Heteroscedastic Regression , 2011, 1104.1050.

[25]  S. Boucheron,et al.  A high-dimensional Wilks phenomenon , 2011 .

[26]  Adrien Saumard Optimal upper and lower bounds for the true and empirical excess risks in heteroscedastic least-squares regression , 2013, 1304.6691.