Machine Learning for Set-Identified Linear Models

This paper provides estimation and inference methods for an identified set where the selection among a very large number of covariates is based on modern machine learning tools. I characterize the boundary of the identified set (i.e., support function) using a semiparametric moment condition. Combining Neyman-orthogonality and sample splitting ideas, I construct a root-N consistent, uniformly asymptotically Gaussian estimator of the support function and propose a weighted bootstrap procedure to conduct inference about the identified set. I provide a general method to construct a Neyman-orthogonal moment condition for the support function. Applying my method to Lee (2008)'s endogenous selection model, I provide the asymptotic theory for the sharp (i.e., the tightest possible) bounds on the Average Treatment Effect in the presence of high-dimensional covariates. Furthermore, I relax the conventional monotonicity assumption and allow the sign of the treatment effect on the selection (e.g., employment) to be determined by covariates. Using JobCorps data set with very rich baseline characteristics, I substantially tighten the bounds on the JobCorps effect on wages under weakened monotonicity assumption.

[1]  Ilya Molchanov,et al.  Sharp identification regions in models with convex predictions: games, individual choice, and incomplete data , 2009 .

[2]  Andrés Santos,et al.  Asymptotically Efficient Estimation of Models Defined by Convex Moment Inequalities , 2014 .

[3]  Xiaohong Chen,et al.  Sensitivity Analysis in Semiparametric Likelihood Models , 2011 .

[4]  Vira Semenova,et al.  Machine Learning for Dynamic Discrete Choice , 2018, 1808.02569.

[5]  J. Robins,et al.  Locally Robust Semiparametric Estimation , 2016, Econometrica.

[6]  C. Manski Anatomy of the Selection Problem , 1989 .

[7]  Francesca Molinari,et al.  Asymptotic Properties for a Class of Partially Identified Models , 2006 .

[8]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[9]  A. Belloni,et al.  Program evaluation and causal inference with high-dimensional data , 2013, 1311.2645.

[10]  J. Heckman Sample selection bias as a specification error , 1979 .

[11]  Joshua D. Angrist,et al.  Long-Term Educational Consequences of Secondary School Vouchers: Evidence from Administrative Records in Colombia , 2006 .

[12]  D. Epple,et al.  Evaluating Education Programs That Have Lotteried Admission and Selective Attrition , 2014, Journal of Labor Economics.

[13]  Hiroaki Kaido A dual approach to inference for partially identified econometric models , 2016 .

[14]  C. Manski,et al.  Inference on Regressions with Interval Data on a Regressor or Outcome , 2002 .

[15]  Prem S. Puri,et al.  On Optimal Asymptotic Tests of Composite Statistical Hypotheses , 1967 .

[16]  A two-stage procedure for partially identified models , 2014 .

[17]  Thomas M. Stoker,et al.  Investigating Smooth Multiple Regression by the Method of Average Derivatives , 2015 .

[18]  P. Robinson ROOT-N-CONSISTENT SEMIPARAMETRIC REGRESSION , 1988 .

[19]  Martin Huber,et al.  Sharp IV Bounds on Average Treatment Effects on the Treated and Other Populations Under Endogeneity and Noncompliance , 2017 .

[20]  J. Powell,et al.  Least absolute deviations estimation for the censored regression model , 1984 .

[21]  Charles F. Manski,et al.  Deterrence and the Death Penalty: Partial Identification Analysis Using Repeated Cross Sections , 2011, Journal of Quantitative Criminology.

[22]  Hiroaki Kaido ASYMPTOTICALLY EFFICIENT ESTIMATION OF WEIGHTED AVERAGE DERIVATIVES WITH AN INTERVAL CENSORED VARIABLE , 2013, Econometric Theory.

[23]  Whitney K. Newey,et al.  Efficiency of weighted average derivative estimators and index models , 1993 .

[24]  E. Tamer,et al.  Market Structure and Multiple Equilibria in Airline Markets , 2009 .

[25]  W. Newey,et al.  The influence function of semiparametric estimators , 2015, Quantitative Economics.

[26]  Charles F. Manski,et al.  Confidence Intervals for Partially Identified Parameters , 2003 .

[27]  Charles F. Manski,et al.  Partial Identification in Econometrics , 2010 .

[28]  Francesca Molinari,et al.  Random Sets in Econometrics , 2018 .

[29]  W. Newey,et al.  The asymptotic variance of semiparametric estimators , 1994 .

[30]  Free to Choose: Can School Choice Reduce Student Achievement? , 2015 .

[31]  Thierry Magnac,et al.  Set Identified Linear Models , 2011 .

[32]  V. Chernozhukov,et al.  Estimation and Confidence Regions for Parameter Sets in Econometric Models , 2007 .

[33]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Causal Parameters , 2016, 1608.00060.

[34]  C. Manski Policy Analysis with Incredible Certitude , 2010 .

[35]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[36]  Holger Sieg,et al.  The Impact of Student Debt on Education, Career, and Marriage Choices of Female Lawyers , 2017, European Economic Review.

[37]  Elie Tamer,et al.  Partial Identification in Econometrics , 2010 .

[38]  Victor Chernozhukov,et al.  Post-Selection Inference for Generalized Linear Models With Many Controls , 2013, 1304.3969.

[39]  Christian Hansen,et al.  High-Dimensional Metrics , 2016 .

[40]  Vouchers for Private Schooling in Colombia : Evidence from a Randomized Natural Experiment , 2001 .

[41]  David S. Lee Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects , 2005 .

[42]  V. Chernozhukov,et al.  Inference for best linear approximations to set identified functions , 2012, 1212.5627.

[43]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .