A Model Selection Criterion for High-Dimensional Linear Regression

Statistical model selection is a great challenge when the number of accessible measurements is much smaller than the dimension of the parameter space. We study the problem of model selection in the context of subset selection for high-dimensional linear regressions. Accordingly, we propose a new model selection criterion with the Fisher information that leads to the selection of a parsimonious model from all the combinatorial models up to some maximum level of sparsity. We analyze the performance of our criterion as the number of measurements grows to infinity, as well as when the noise variance tends to zero. In each case, we prove that our proposed criterion gives the true model with a probability approaching one. Additionally, we devise a computationally affordable algorithm to conduct model selection with the proposed criterion in practice. Interestingly, as a side product, our algorithm can provide the ideal regularization parameter for the Lasso estimator such that Lasso selects the true variables. Finally, numerical simulations are included to support our theoretical findings.

[1]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2011 .

[2]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[5]  E. Candès,et al.  Near-ideal model selection by ℓ1 minimization , 2008, 0801.0345.

[6]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[7]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[8]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  Stéphane Chrétien,et al.  Sparse Recovery With Unknown Variance: A LASSO-Type Approach , 2011, IEEE Transactions on Information Theory.

[11]  Karl W. Broman,et al.  A model selection approach for the identification of quantitative trait loci in experimental crosses , 2002 .

[12]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[13]  PAUL EMBRECHTS,et al.  Modelling of extremal events in insurance and finance , 1994, Math. Methods Oper. Res..

[14]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[15]  G. Kitagawa,et al.  Information Criteria and Statistical Modeling , 2007 .

[16]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[17]  Magnus Jansson,et al.  Model selection for high-dimensional data , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[18]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[19]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[20]  H. Akaike A new look at the statistical model identification , 1974 .

[21]  C. Raghavendra Rao,et al.  On model selection , 2001 .

[22]  J. Corcoran Modelling Extremal Events for Insurance and Finance , 2002 .

[23]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[24]  M. Woodroofe On Model Selection and the ARC Sine Laws , 1982 .

[25]  J. H. Curtiss,et al.  On the Distribution of the Quotient of Two Chance Variables , 1941 .

[26]  D. Haughton On the Choice of a Model to Fit Data from an Exponential Family , 1988 .

[27]  Petre Stoica,et al.  On the Proper Forms of BIC for Model Order Selection , 2012, IEEE Transactions on Signal Processing.

[28]  R. Shibata An optimal selection of regression variables , 1981 .