Model selection and estimation in regression with grouped variables

Summary.  We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor analysis‐of‐variance problem as the most important and well‐known example. Instead of selecting factors by stepwise backward elimination, we focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non‐negative garrotte for factor selection. The lasso, the LARS algorithm and the non‐negative garrotte are recently proposed regression methods that can be used to select individual variables. We study and propose efficient algorithms for the extensions of these methods for factor selection and show that these extensions give superior performance to the traditional stepwise backward elimination method in factor selection problems. We study the similarities and the differences between these methods. Simulations and real examples are used to illustrate the methods.

[1]  Richard S. Varga,et al.  Proof of Theorem 5 , 1983 .

[2]  Richard S. Varga,et al.  Proof of Theorem 6 , 1983 .

[3]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[4]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[5]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[6]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[9]  Sergey Bakin,et al.  Adaptive regression and model selection in data mining problems , 1999 .

[10]  Dean Phillips Foster,et al.  Calibration and empirical Bayes variable selection , 2000 .

[11]  E. George The Variable Selection Problem , 2000 .

[12]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[13]  Xiaotong Shen,et al.  Adaptive Model Selection , 2002 .

[14]  Hao Helen Zhang,et al.  Component selection and smoothing in smoothing spline analysis of variance models -- COSSO , 2003 .

[15]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[16]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[17]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.