Optimal cross-validation in density estimation with the $L^{2}$-loss

We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-$p$-out CV procedure (Lpo), where $p$ denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon $V$-fold cross-validation in terms of variability and computational complexity. From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with $p=1$, is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size $n$, optimality is achieved for $p$ large enough [with $p/n=o(1)$] to balance the overfitting resulting from the structure of the model collection. For identification, model selection consistency is settled for Lpo as long as $p/n$ is conveniently related to the rate of convergence of the best estimator in the collection: (i) $p/n\to1$ as $n\to+\infty$ with a parametric rate, and (ii) $p/n=o(1)$ with some nonparametric estimators. These theoretical results are validated by simulation experiments.

[1]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[2]  C. Mallows Some Comments on Cp , 2000, Technometrics.

[3]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[4]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[5]  P. Massart,et al.  Minimal Penalties for Gaussian Model Selection , 2007 .

[6]  E. Rio,et al.  Concentration around the mean for maxima of empirical processes , 2005, math/0506594.

[7]  C. L. Mallows Some comments on C_p , 1973 .

[8]  C. J. Stone,et al.  An Asymptotically Optimal Window Selection Rule for Kernel Density Estimates , 1984 .

[9]  P. Djurić,et al.  Model selection by cross-validation , 1990, IEEE International Symposium on Circuits and Systems.

[10]  M. Newton,et al.  A Rank Statistics Approach to the Consistency of a General Bootstrap , 1992 .

[11]  Gwénaelle Castellan Density estimation via exponential model selection , 2003, IEEE Trans. Inf. Theory.

[12]  Marie-Claude Sauvé,et al.  Histogram selection in non gaussian regression , 2009 .

[13]  J. Shao AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[14]  Sylvain Arlot Model selection by resampling penalization , 2007, 0906.3124.

[15]  S. Geisser A predictive approach to the random effect model , 1974 .

[16]  A. Bowman An alternative method of cross-validation for the smoothing of density estimates , 1984 .

[17]  M. Talagrand New concentration inequalities in product spaces , 1996 .

[18]  Stéphane Robin,et al.  Nonparametric density estimation by exact leave-p-out cross-validation , 2008, Comput. Stat. Data Anal..

[19]  S. Dudoit,et al.  Asymptotics of cross-validated risk estimation in estimator selection and performance assessment , 2005 .

[20]  Ping Zhang Model Selection Via Multifold Cross Validation , 1993 .

[21]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[22]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[23]  A C C Gibbs,et al.  Data Analysis , 2009, Encyclopedia of Database Systems.

[24]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[25]  Yuhong Yang,et al.  An Asymptotic Property of Model Selection Criteria , 1998, IEEE Trans. Inf. Theory.

[26]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[27]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[28]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[29]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[30]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[31]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[32]  Yuhong Yang CONSISTENCY OF CROSS VALIDATION FOR COMPARING REGRESSION PROCEDURES , 2007, 0803.2963.

[33]  O. Bousquet A Bennett concentration inequality and its application to suprema of empirical processes , 2002 .

[34]  M. Wegkamp Model selection in nonparametric regression , 2003 .

[35]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[36]  I. Johnstone,et al.  Density estimation by wavelet thresholding , 1996 .

[37]  Yves Rozenholc,et al.  How many bins should be put in a regular histogram , 2006 .

[38]  P. Massart,et al.  Discussion: Local Rademacher complexities and oracle inequalities in risk minimization , 2006 .

[39]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[40]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[41]  Sophie Lambert-Lacroix,et al.  On minimax density estimation on R , 2001 .

[42]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[43]  Pascal Massart,et al.  Data-driven Calibration of Penalties for Least-Squares Regression , 2008, J. Mach. Learn. Res..

[44]  Ker-Chau Li,et al.  Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set , 1987 .

[45]  M. H. Quenouille Approximate Tests of Correlation in Time‐Series , 1949 .

[46]  P. Burman A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods , 1989 .

[47]  E. Giné Lectures on some aspects of the bootstrap , 1997 .

[48]  Sylvain Arlot,et al.  Segmentation of the mean of heteroscedastic data via cross-validation , 2009, Stat. Comput..

[49]  M. Stone An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion , 1977 .

[50]  S. Larson The shrinkage of the coefficient of multiple correlation. , 1931 .

[51]  Alain Celisse,et al.  Model selection via cross-validation in density estimation, regression, and change-points detection , 2008 .

[52]  Alain Celisse,et al.  A leave-p-out based estimation of the proportion of null hypotheses , 2008, 0804.1189.

[53]  Sylvie Huet,et al.  Gaussian model selection with an unknown variance , 2007, math/0701250.

[54]  M. Rudemo Empirical Choice of Histograms and Kernel Density Estimators , 1982 .

[55]  P. Massart,et al.  From Model Selection to Adaptive Estimation , 1997 .

[56]  R. Z. Khasʹminskiĭ,et al.  Statistical estimation : asymptotic theory , 1981 .

[57]  Edmond Chow,et al.  A cross-validatory method for dependent data , 1994 .

[58]  G. Lugosi,et al.  Adaptive Model Selection Using Empirical Complexities , 1998 .

[59]  George G. Lorentz,et al.  Constructive Approximation , 1993, Grundlehren der mathematischen Wissenschaften.

[60]  P. Massart,et al.  Gaussian model selection , 2001 .