Factor models and variable selection in high-dimensional regression analysis

The paper considers linear regression problems where the number of predictor variables is possibly larger than the sample size. The basic motivation of the study is to combine the points of view of model selection and functional regression by using a factor approach: it is assumed that the predictor vector can be decomposed into a sum of two uncorrelated random components reflecting common factors and specific variabilities of the explanatory variables. It is shown that the traditional assumption of a sparse vector of parameters is restrictive in this context. Common factors may possess a significant influence on the response variable which cannot be captured by the specific effects of a small number of individual variables. We therefore propose to include principal components as additional explanatory variables in an augmented regression model. We give finite sample inequalities for estimates of these components. It is then shown that model selection procedures can be used to estimate the parameters of the augmented model, and we derive theoretical properties of the estimators. Finite sample performance is illustrated by a simulation study.

[1]  T. Tony Cai,et al.  Prediction in functional linear regression , 2006 .

[2]  K. J. Utikal,et al.  Inference for Density Families Using Functional Principal Component Analysis , 2001 .

[3]  M. Hallin,et al.  The Generalized Dynamic-Factor Model: Identification and Estimation , 2000, Review of Economics and Statistics.

[4]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[5]  P. Sarda,et al.  CLT in functional linear regression models , 2005, math/0508073.

[6]  F. Dias,et al.  Determining the number of factors in approximate factor models with global and group-specific factors , 2008 .

[7]  J. Bai,et al.  Inferential Theory for Factor Models of Large Dimensions , 2003 .

[8]  Jane-ling Wang,et al.  Functional linear regression analysis for longitudinal data , 2005, math/0603132.

[9]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[10]  S. Geer,et al.  Adaptive Lasso for High Dimensional Regression and Gaussian Graphical Modeling , 2009, 0903.2515.

[11]  P. Sarda,et al.  Smoothing splines estimators for functional linear regression , 2009, 0902.4344.

[12]  P. Sarda,et al.  Functional linear model , 1999 .

[13]  J. Ramsay,et al.  Some Tools for Functional Data Analysis , 1991 .

[14]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[15]  V. Koltchinskii The Dantzig selector and sparsity oracle inequalities , 2009, 0909.0861.

[16]  Joel L. Horowitz,et al.  Methodology and convergence rates for functional linear regression , 2007, 0708.0466.

[17]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[18]  J. Stock,et al.  Forecasting Using Principal Components From a Large Number of Predictors , 2002 .

[19]  P. Hall,et al.  On properties of functional principal components analysis , 2006 .

[20]  J. Bai,et al.  Panel Data Models With Interactive Fixed Effects , 2009 .

[21]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[22]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[23]  A. Cuevas,et al.  Linear functional regression: The case of fixed design and functional response , 2002 .

[24]  Larry A. Wasserman,et al.  Time varying undirected graphs , 2008, Machine Learning.

[25]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[26]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[27]  Jean Boivin,et al.  Monetary Policy in a Data-Rich Environment , 2001 .