THE FACTOR-LASSO AND K-STEP BOOTSTRAP APPROACH FOR INFERENCE IN HIGH-DIMENSIONAL ECONOMIC APPLICATIONS

We consider inference about coefficients on a small number of variables of interest in a linear panel data model with additive unobserved individual and time specific effects and a large number of additional time-varying confounding variables. We allow the number of these additional confounding variables to be larger than the sample size, and suppose that, in addition to unrestricted time and individual specific effects, these confounding variables are generated by a small number of common factors and high-dimensional weakly-dependent disturbances. We allow that both the factors and the disturbances are related to the outcome variable and other variables of interest. To make informative inference feasible, we impose that the contribution of the part of the confounding variables not captured by time specific effects, individual specific effects, or the common factors can be captured by a relatively small number of terms whose identities are unknown. Within this framework, we provide a convenient computational algorithm based on factor extraction followed by lasso regression for inference about parameters of interest and show that the resulting procedure has good asymptotic properties. We also provide a simple k-step bootstrap procedure that may be used to construct inferential statements about parameters of interest and prove its asymptotic validity. The proposed bootstrap may be of substantive independent interest outside of the present context as the proposed bootstrap may readily be adapted to other contexts involving inference after lasso variable selection and the proof of its validity requires some new technical arguments. We also provide simulation evidence about performance of our procedure and illustrate its use in two empirical applications.

[1]  W. Newey,et al.  Double machine learning for treatment and causal parameters , 2016 .

[2]  S. Lahiri,et al.  Bootstrapping Lasso Estimators , 2011 .

[3]  Donald W. K. Andrews,et al.  Higher‐Order Improvements of a Computationally Attractive k‐Step Bootstrap for Extremum Estimators , 2002 .

[4]  F. Dias,et al.  Determining the number of factors in approximate factor models with global and group-specific factors , 2008 .

[5]  Robert Tibshirani,et al.  Post-selection adaptive inference for Least Angle Regression and the Lasso , 2014 .

[6]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[7]  M. Weidner,et al.  Linear Regression for Panel with Unknown Number of Factors as Interactive Fixed Effects , 2014 .

[8]  A. Belloni,et al.  SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN , 2012 .

[9]  S. Mukherjee,et al.  Partial Factor Modeling: Predictor-Dependent Shrinkage for Linear Regression , 2013 .

[10]  Jean Boivin,et al.  Measuring the Effects of Monetary Policy: A Factor-Augmented Vector Autoregressive (FAVAR) Approach , 2003 .

[11]  Martin J. Wainwright,et al.  Restricted Eigenvalue Properties for Correlated Gaussian Designs , 2010, J. Mach. Learn. Res..

[12]  R. Tibshirani,et al.  Adaptive testing for the graphical lasso , 2013, 1307.4765.

[13]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[14]  Qi Li,et al.  Determining the number of factors when the number of factors can increase with sample size , 2017 .

[15]  C. Hansen Asymptotic properties of a robust variance matrix estimator for panel data when T is large , 2007 .

[16]  E. Rio,et al.  A Bernstein type inequality and moderate deviations for weakly dependent sequences , 2009, 0902.0582.

[17]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[18]  J. Bai,et al.  Panel Data Models With Interactive Fixed Effects , 2009 .

[19]  J. Stock,et al.  Forecasting Using Principal Components From a Large Number of Predictors , 2002 .

[20]  Martin J. Wainwright,et al.  Fast global convergence of gradient methods for high-dimensional statistical recovery , 2011, ArXiv.

[21]  Jianqing Fan,et al.  Nonconcave Penalized Likelihood With NP-Dimensionality , 2009, IEEE Transactions on Information Theory.

[22]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[23]  E. Mammen Bootstrap and Wild Bootstrap for High Dimensional Linear Models , 1993 .

[24]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[25]  Yuan Liao,et al.  Statistical Inferences Using Large Estimated Covariances for Panel Data and Factor Models , 2013, 1307.2662.

[26]  P. Cook,et al.  The Social Costs of Gun Ownership , 2004 .

[27]  Liangjun Su,et al.  TESTING HOMOGENEITY IN PANEL DATA MODELS WITH INTERACTIVE FIXED EFFECTS , 2013, Econometric Theory.

[28]  M. Farrell Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations , 2015 .

[29]  James A. Robinson,et al.  The Colonial Origins of Comparative Development: An Empirical Investigation , 2000 .

[30]  A. Belloni,et al.  Program evaluation with high-dimensional data , 2013 .

[31]  Elena Manresa,et al.  Grouped Patterns of Heterogeneity in Panel Data , 2015 .

[32]  Cheng Hsiao,et al.  A Panel Data Approach for Program Evaluation — Measuring the Benefits of Political and Economic Integration of Hong Kong with Mainland China , 2012 .

[33]  Christian Hansen,et al.  The Reduced Form: A Simple Approach to Inference with Weak Instruments , 2005 .

[34]  Joshua R. Loftus,et al.  A significance test for forward stepwise model selection , 2014, 1405.3920.

[35]  J. Bai,et al.  Confidence Intervals for Diffusion Index Forecasts and Inference for Factor-Augmented Regressions , 2006 .

[36]  A. Tsybakov,et al.  High-dimensional instrumental variables regression and confidence sets -- v2/2012 , 2018, 1812.11330.

[37]  Jianqing Fan,et al.  Sufficient Forecasting Using Factor Models , 2014, Journal of econometrics.

[38]  Kunpeng Li,et al.  Theory and methods of panel data models with interactive effects , 2014, 1402.6550.

[39]  In Choi,et al.  EFFICIENT ESTIMATION OF FACTOR MODELS , 2011, Econometric Theory.

[40]  Martin Weidner,et al.  DYNAMIC LINEAR PANEL REGRESSION MODELS WITH INTERACTIVE FIXED EFFECTS , 2014, Econometric Theory.

[41]  M. Arellano,et al.  Computing Robust Standard Errors for Within-Groups Estimators , 2009 .

[42]  J. Bai,et al.  Inferential Theory for Factor Models of Large Dimensions , 2003 .

[43]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[44]  Kengo Kato,et al.  Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors , 2013 .

[45]  Kathleen T. Li,et al.  Estimation of average treatment effects with panel data: Asymptotic theory and implementation , 2017 .

[46]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[47]  A. Buja,et al.  Valid post-selection inference , 2013, 1306.1059.

[48]  A. Coughlan,et al.  An Empirical Investigation * , 2002 .

[49]  Victor Chernozhukov,et al.  Post-Selection Inference for Generalized Linear Models With Many Controls , 2013, 1304.3969.

[50]  Kengo Kato,et al.  Uniform post selection inference for LAD regression models , 2013 .

[51]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[52]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[53]  B. L. S. Prakasa Rao,et al.  Conditional independence, conditional mixing and conditional association , 2009 .

[54]  Kengo Kato,et al.  Uniform post selection inference for LAD regression and other z-estimation problems , 2013 .

[55]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..

[56]  Z. Luo,et al.  On the Linear Convergence of the Approximate Proximal Splitting Method for Non-smooth Convex Optimization , 2014, 1404.5350.

[57]  Dennis L. Sun,et al.  Optimal Inference After Model Selection , 2014, 1410.2597.

[58]  Seung C. Ahn,et al.  Eigenvalue Ratio Test for the Number of Factors , 2013 .

[59]  Dennis L. Sun,et al.  Exact post-selection inference with the lasso , 2013 .

[60]  M. Pesaran Estimation and Inference in Large Heterogeneous Panels with a Multifactor Error Structure , 2004, SSRN Electronic Journal.

[61]  Jonathan E. Taylor,et al.  Exact Post Model Selection Inference for Marginal Screening , 2014, NIPS.

[62]  Christian Hansen,et al.  Inference in High-Dimensional Panel Models With an Application to Gun Control , 2014, 1411.6507.

[63]  A. Belloni,et al.  Honest Confidence Regions for Logistic Regression with a Large Number of Controls , 2013 .

[64]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[65]  Peter Bühlmann,et al.  High-dimensional simultaneous inference with the bootstrap , 2016, 1606.03940.

[66]  D. Kozbur Testing-Based Forward Model Selection , 2015 .