Confidence sets for model selection by F -testing

We introduce the notion of variable selection con dence set (VSCS) for linear regression based on F -testing. Our method identi es the most important variables in a principled way that goes beyond simply trusting the single lucky winner based on a model selection criterion. The VSCS extends the usual notion of con dence intervals to the variable selection problem: A VSCS is a set of regression models that contains the true model with a given level of con dence. Although the size of the VSCS properly re ects the model selection uncertainty, without speci c assumptions on the true model, the VSCS is typically rather large (unless the number of predictors is small). As a solution, we advocate special attention to the set of lower boundary models (LBMs), which are the most parsimonious models that are not statistically signi cantly inferior to the full model at a given con dence level. Based on the LBMs, variable importance and measures of co-appearance importance of predictors can be naturally de ned.

[1]  Qing Li,et al.  The Bayesian elastic net , 2010 .

[2]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[3]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[4]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[5]  V. Sheffield,et al.  Regulation of gene expression in the mammalian eye and its relevance to eye disease , 2006, Proceedings of the National Academy of Sciences.

[6]  Thomas L Casavant,et al.  Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (BBS11). , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Yuhong Yang,et al.  Combining Linear Regression Models , 2005, Journal of the American Statistical Association.

[8]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[9]  N. Hjort,et al.  Frequentist Model Average Estimators , 2003 .

[10]  Joseph P. Romano,et al.  Exact and Approximate Stepdown Methods for Multiple Hypothesis Testing , 2003 .

[11]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[12]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[13]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[14]  Yuhong Yang Adaptive Regression by Mixing , 2001 .

[15]  L. Birge,et al.  An alternative point of view on Lepski's method , 2001 .

[16]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[17]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[18]  Hidetoshi Shimodaira An Application of Multiple Comparison Techniques to Model Selection , 1998 .

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  David Draper,et al.  Assessment and Propagation of Model Uncertainty , 2011 .

[21]  C. Chatfield Model uncertainty, data mining and statistical inference , 1995 .

[22]  T. Stamey,et al.  Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients. , 1989, The Journal of urology.

[23]  E. Lehmann,et al.  Testing Statistical Hypothesis. , 1960 .