Model selection by resampling penalization

In this paper, a new family of resampling-based penalization procedures for model selection is defined in a general framework. It generalizes several methods, including Efron's bootstrap penalization and the leave-one-out penalization recently proposed by Arlot (2008), to any exchangeable weighted bootstrap resampling scheme. In the heteroscedastic regression framework, assuming the models to have a particular structure, these resampling penalties are proved to satisfy a non-asymptotic oracle inequality with leading constant close to 1. In particular, they are asympotically optimal. Resampling penalties are used for defining an estimator adapting simultaneously to the smoothness of the regression function and to the heteroscedasticity of the noise. This is remarkable because resampling penalties are general-purpose devices, which have not been built specifically to handle heteroscedastic data. Hence, resampling penalties naturally adapt to heteroscedasticity. A simulation study shows that resampling penalties improve on V-fold cross-validation in terms of final prediction error, in particular when the signal-to-noise ratio is not large.

[1]  H. Akaike Statistical predictor identification , 1970 .

[2]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[3]  C. L. Mallows Some comments on C_p , 1973 .

[4]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[5]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[6]  H. G. Burchard,et al.  Piecewise polynomial approximation on optimal meshes , 1975 .

[7]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[8]  R. Lew BOUNDS ON NEGATIVE MOMENTS , 1976 .

[9]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[10]  C. J. Stone,et al.  Optimal Rates of Convergence for Nonparametric Estimators , 1980 .

[11]  R. Shibata An optimal selection of regression variables , 1981 .

[12]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[13]  I. T. Jolliffe,et al.  Springer series in statistics , 1986 .

[14]  Changbao Wu,et al.  Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis , 1986 .

[15]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[16]  Ker-Chau Li,et al.  Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set , 1987 .

[17]  P. Burman A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods , 1989 .

[18]  E. Mammen When Does Bootstrap Work?: Asymptotic Results and Simulations , 1992 .

[19]  P. Hall The Bootstrap and Edgeworth Expansion , 1992 .

[20]  M. Newton,et al.  A Rank Statistics Approach to the Consistency of a General Bootstrap , 1992 .

[21]  Ping Zhang Model Selection Via Multifold Cross Validation , 1993 .

[22]  J. Wellner,et al.  Exchangeably Weighted Bootstraps of the General Empirical Process , 1993 .

[23]  Paul Janssen,et al.  Consistency of the Generalized Bootstrap for Degenerate $U$-Statistics , 1993 .

[24]  A. Tsybakov,et al.  Minimax theory of image reconstruction , 1993 .

[25]  E. Mammen,et al.  On General Resampling Algorithms and their Performance in Distribution Estimation , 1994 .

[26]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[27]  P. Bertail,et al.  The Weighted Bootstrap , 1995 .

[28]  C. Mallows More comments on C p , 1995 .

[29]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[30]  J. Shao Bootstrap Model Selection , 1996 .

[31]  Sam Efromovich,et al.  SHARP-OPTIMAL AND ADAPTIVE ESTIMATION FOR HETEROSCEDASTIC NONPARAMETRIC REGRESSION , 1996 .

[32]  R. Shibata BOOTSTRAP ESTIMATE OF KULLBACK-LEIBLER INFORMATION FOR MODEL SELECTION , 1997 .

[33]  J. Cavanaugh,et al.  A BOOTSTRAP VARIANT OF AIC FOR STATE-SPACE MODEL SELECTION , 1997 .

[34]  G. Kitagawa,et al.  Bootstrapping Log Likelihood and EIC, an Extension of AIC , 1997 .

[35]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[36]  J. Shao AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[37]  Devdatt P. Dubhashi,et al.  Balls and bins: A study in negative dependence , 1996, Random Struct. Algorithms.

[38]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[39]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[40]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[41]  G. Claeskens,et al.  Testing the Fit of a Parametric Function , 1999 .

[42]  Y. Baraud Model selection for regression on a fixed design , 2000 .

[43]  Colin L. Mallows,et al.  Some Comments on Cp , 2000, Technometrics.

[44]  J. Zinn,et al.  Exponential and Moment Inequalities for U-Statistics , 2000, math/0003228.

[45]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[46]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[47]  P. Massart,et al.  Gaussian model selection , 2001 .

[48]  Prabir Burman Estimation of equifrequency histograms , 2002 .

[49]  Y. Baraud Model selection for regression on a random design , 2002 .

[50]  L. Györfi,et al.  A Distribution-Free Theory of Nonparametric Regression (Springer Series in Statistics) , 2002 .

[51]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[52]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[53]  Peter L. Bartlett,et al.  Local Complexities for Empirical Risk Minimization , 2004, COLT.

[54]  Jean-Yves Audibert Théorie statistique de l'apprentissage : une approche PAC-Bayésienne , 2004 .

[55]  G. Lugosi,et al.  Complexity regularization via localized random penalties , 2004, math/0410091.

[56]  Anatoly Zhigljavsky,et al.  Approximating the negative moments of the Poisson distribution , 2004 .

[57]  ˇ. Markoˇ ASYMPTOTIC EXPANSION FOR INVERSE MOMENTS OF BINOMIAL AND POISSON DISTRIBUTIONS , 2005 .

[58]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[59]  S. Boucheron,et al.  Moment inequalities for functions of independent random variables , 2005, math/0503651.

[60]  C. Scovel,et al.  Concentration of the hypergeometric distribution , 2005 .

[61]  Magalie Fromont,et al.  Model selection by bootstrap penalization for classification , 2004, Machine Learning.

[62]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[63]  V. Koltchinskii Rejoinder: Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0135.

[64]  Yuhong Yang CONSISTENCY OF CROSS VALIDATION FOR COMPARING REGRESSION PROCEDURES , 2007, 0803.2963.

[65]  Gilles Blanchard,et al.  Some nonasymptotic results on resampling in high dimension, I: Confidence regions, II: Multiple tests , 2007 .

[66]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[67]  P. Massart,et al.  Minimal Penalties for Gaussian Model Selection , 2007 .

[68]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[69]  X. Gendre Simultaneous estimation of the mean and the variance in heteroscedastic Gaussian regression , 2008, 0807.2547.

[70]  L. Galtchouk,et al.  Adaptive asymptotically efficient estimation in heteroscedastic nonparametric regression via model selection. , 2008, 0810.1173.

[71]  Sylvain Arlot TECHNICAL APPENDIX TO "V -FOLD CROSS-VALIDATION IMPROVED: V -FOLD PENALIZATION , 2008, 0802.0566.

[72]  Charles J. Stone,et al.  AN ASYMPTOTICALLY OPTIMAL HISTOGRAM SELECTION RULE , 2008 .

[73]  Sylvain Arlot,et al.  Suboptimality of penalties proportional to the dimension for model selection in heteroscedastic regression , 2008 .

[74]  英敦 塚原 Aad W. van der Vaart and Jon A. Wellner: Weak Convergence and Empirical Processes: With Applications to Statistics, Springer,1996年,xvi + 508ページ. , 2009 .

[75]  Pascal Massart,et al.  Data-driven Calibration of Penalties for Least-Squares Regression , 2008, J. Mach. Learn. Res..

[76]  Marie-Claude Sauvé,et al.  Histogram selection in non gaussian regression , 2009 .

[77]  Sylvain Arlot,et al.  Choosing a penalty for model selection in heteroscedastic regression , 2008, 0812.3141.

[78]  G. Blanchard,et al.  Some nonasymptotic results on resampling in high dimension, I: Confidence regions, II: Multiple tests , 2007, 0712.0775.