Model selection procedure for high‐dimensional data

For high‐dimensional regression, the number of predictors may greatly exceed the sample size but only a small fraction of them are related to the response. Therefore, variable selection is inevitable, where consistent model selection is the primary concern. However, conventional consistent model selection criteria like Bayesian information criterion (BIC) may be inadequate due to their nonadaptivity to the model space and infeasibility of exhaustive search. To address these two issues, we establish a probability lower bound of selecting the smallest true model by an information criterion, based on which we propose a model selection criterion, what we call RICc, which adapts to the model space. Furthermore, we develop a computationally feasible method combining the computational power of least angle regression (LAR) with that of RICc. Both theoretical and simulation studies show that this method identifies the smallest true model with probability converging to one if the smallest true model is selected by LAR. The proposed method is applied to real data from the power market and outperforms the backward variable selection in terms of price forecasting accuracy. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 350‐358, 2010

[1]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[2]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[3]  M. Stone Comments on Model Selection Criteria of Akaike and Schwarz , 1979 .

[4]  B. G. Quinn,et al.  The determination of the order of an autoregression , 1979 .

[5]  R. Shibata An optimal selection of regression variables , 1981 .

[6]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  Dean Phillips Foster,et al.  Calibration and Empirical Bayes Variable Selection , 1997 .

[9]  J. Shao AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[10]  R. Tibshirani,et al.  The Covariance Inflation Criterion for Adaptive Model Selection , 1999 .

[11]  Dean Phillips Foster,et al.  Calibration and empirical Bayes variable selection , 2000 .

[12]  J. Brian Gray,et al.  Applied Regression Including Computing and Graphics , 1999, Technometrics.

[13]  Xiaotong Shen,et al.  Adaptive Model Selection , 2002 .

[14]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[15]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[16]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[17]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[18]  Yongli Zhang,et al.  Model selection: A Lagrange optimization approach , 2009 .