Two-Step Estimation and Inference with Possibly Many Included Covariates

We study the implications of including many covariates in a first-step estimate entering a two-step estimation procedure. We find that a first-order bias emerges when the number of included covariates is “large” relative to the square-root of sample size, rendering standard inference procedures invalid. We show that the jackknife is able to estimate this “many covariates” bias consistently, thereby delivering a new automatic bias-corrected two-step point estimator. The jackknife also consistently estimates the standard error of the original two-step point estimator. For inference, we develop a valid post-bias-correction bootstrap approximation that accounts for the additional variability introduced by the jackknife bias-correction. We find that the jackknife bias-corrected point estimator and the bootstrap post-bias-correction inference perform excellent in simulations, offering important improvements over conventional two-step point estimators and inference procedures, which are not robust to including many covariates. We apply our results to an array of distinct treatment effect, policy evaluation, and other applied microeconomics settings. In particular, we discuss production function and marginal treatment effect estimation in detail.

[1]  Jeffrey M. Woodbridge Econometric Analysis of Cross Section and Panel Data , 2002 .

[2]  Matias D. Cattaneo,et al.  FEDERAL RESERVE BANK OF NEW YORK , 2010 .

[3]  Jeffrey M. Wooldridge,et al.  Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data , 2003 .

[4]  Matias D. Cattaneo,et al.  Kernel-Based Semiparametric Estimators: Small Bandwidth Asymptotics and Bootstrap Consistency , 2018 .

[5]  J. A. Díaz-García,et al.  SENSITIVITY ANALYSIS IN LINEAR REGRESSION , 2022 .

[6]  J. Angrist,et al.  Identification and Estimation of Local Average Treatment Effects , 1994 .

[7]  Geert Ridder,et al.  Asymptotic Variance of Semiparametric Estimators With Generated Regressors , 2013 .

[8]  E. Mammen Asymptotics with increasing dimension for robust regression with applications to the bootstrap , 1989 .

[9]  W. Newey,et al.  The asymptotic variance of semiparametric estimators , 1994 .

[10]  Patrick M. Kline,et al.  Higher Order Properties of the Wild Bootstrap Under Misspecification , 2011 .

[11]  Alberto Abadie Semiparametric Difference-in-Differences Estimators , 2005 .

[12]  J. Angrist,et al.  Jackknife Instrumental Variables Estimation , 1995 .

[13]  Matthew D. Webb Reworking wild bootstrap‐based inference for clustered errors , 2014, Canadian Journal of Economics/Revue canadienne d'économique.

[14]  Hidehiko Ichimura,et al.  Implementing Nonparametric and Semiparametric Estimators , 2006 .

[15]  Matias D. Cattaneo,et al.  Efficient semiparametric estimation of multi-valued treatment effects under ignorability , 2010 .

[16]  J. Robins,et al.  Locally Robust Semiparametric Estimation , 2016, Econometrica.

[17]  A. Pakes,et al.  The Dynamics of Productivity in the Telecommunications Equipment Industry , 1992 .

[18]  James J Heckman,et al.  Understanding Instrumental Variables in Models with Essential Heterogeneity , 2006, The Review of Economics and Statistics.

[19]  Victor Chernozhukov,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011 .

[20]  Xiaohong Chen Chapter 76 Large Sample Sieve Estimation of Semi-Nonparametric Models , 2007 .

[21]  ALTERNATIVE ASYMPTOTICS AND THE PARTIALLY LINEAR MODEL WITH MANY REGRESSORS , 2016, Econometric Theory.

[22]  Matias D. Cattaneo,et al.  Generalized Jackknife Estimators of Weighted Average Derivatives , 2013 .

[23]  Xiaohong Chen,et al.  Semiparametric efficiency in GMM models with auxiliary data , 2007, 0705.0069.

[24]  Matias D. Cattaneo,et al.  Econometric Methods for Program Evaluation , 2018, Annual Review of Economics.

[25]  Richard K. Crump,et al.  Federal Reserve Bank of New York Staff Reports Bootstrapping Density-weighted Average Derivatives Bootstrapping Density-weighted Average Derivatives , 2010 .

[26]  Robert A. Moffitt,et al.  The Estimation of Wage Gains and Welfare Gains in Self-selection , 1987 .

[27]  Petra E. Todd,et al.  Chapter 74 Implementing Nonparametric and Semiparametric Estimators , 2007 .

[28]  W. Newey,et al.  Large sample estimation and hypothesis testing , 1986 .

[29]  Lei Qi,et al.  Sparse High Dimensional Models in Economics. , 2011, Annual review of economics.

[30]  E. Vytlacil Independence, Monotonicity, and Latent Index Models: An Equivalence Result , 2002 .

[31]  Inference in Linear Regression Models with Many Covariates and Heteroscedasticity , 2018, Journal of the American Statistical Association.

[32]  P. Bickel,et al.  On robust regression with high-dimensional predictors , 2013, Proceedings of the National Academy of Sciences.

[33]  Guido W. Imbens,et al.  The Interpretation of Instrumental Variables Estimators in Simultaneous Equations Models with an Application to the Demand for Fish , 2000 .

[34]  A. Belloni,et al.  Program evaluation and causal inference with high-dimensional data , 2013, 1311.2645.

[35]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[36]  Edward Vytlacil,et al.  Estimating Marginal Returns to Education , 2010, The American economic review.

[37]  M. Weidner,et al.  Fixed effect estimation of large T panel data models , 2017 .

[38]  Matias D. Cattaneo,et al.  ALTERNATIVE ASYMPTOTICS AND THE PARTIALLY LINEAR MODEL WITH MANY REGRESSORS , 2015, Econometric Theory.

[39]  Kengo Kato,et al.  Some new asymptotic theory for least squares series: Pointwise and uniform results , 2012, 1212.0442.

[40]  Richard K. Crump,et al.  Robust Data-Driven Inference for Density-Weighted Average Derivatives , 2009 .

[41]  M. Farrell Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations , 2013, 1309.4686.

[42]  Whitney K. Newey,et al.  Cross-fitting and fast remainder rates for semiparametric estimation , 2017, 1801.09138.

[43]  M. Arellano,et al.  The Time Series and Cross-Section Asymptotics of Dynamic Panel Data Estimators , 2003 .

[44]  E. Mammen Bootstrap and Wild Bootstrap for High Dimensional Linear Models , 1993 .

[45]  Anders Björklund,et al.  Estimation of Wage Gains and Welfare Gains from Self-Selection Models , 1983 .

[46]  Alberto Abadie Semiparametric instrumental variable estimation of treatment response models , 2003 .

[47]  S. Chatterjee Sensitivity analysis in linear regression , 1988 .

[48]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[49]  Matias D. Cattaneo,et al.  Inference in Linear Regression Models with Many Covariates and Heteroscedasticity , 2015, Journal of the American Statistical Association.

[50]  Soumendu Sundar Mukherjee Weak convergence and empirical processes , 2019 .

[51]  M. Weidner,et al.  Fixed Effects Estimation of Large-TPanel Data Models , 2018, Annual Review of Economics.

[52]  Max H. Farrell,et al.  On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference , 2015, Journal of the American Statistical Association.

[53]  Max H. Farrell,et al.  Large sample properties of partitioning-based series estimators , 2018, The Annals of Statistics.

[54]  J. Angrist,et al.  Identification and Estimation of Local Average Treatment Effects , 1995 .

[55]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[56]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[57]  BOOTSTRAPPING DENSITY-WEIGHTED AVERAGE DERIVATIVES , 2014, Econometric Theory.

[58]  Steven T. Berry,et al.  Chapter 63 Econometric Tools for Analyzing Market Outcomes , 2007 .

[59]  V. Koltchinskii,et al.  High Dimensional Probability , 2006, math/0612726.

[60]  J. Angrist,et al.  Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings , 1999 .

[61]  J. Wooldridge Control Function Methods in Applied Econometrics , 2015, The Journal of Human Resources.

[62]  Jinyong Hahn,et al.  JACKKNIFE AND ANALYTICAL BIAS REDUCTION FOR NONLINEAR PANEL MODELS , 2003 .

[63]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .