Double/Debiased Machine Learning for Treatment and Structural Parameters

We revisit the classic semiparametric problem of inference on a low dimensional parameter θ_0 in the presence of high-dimensional nuisance parameters η_0. We depart from the classical setting by allowing for η_0 to be so high-dimensional that the traditional assumptions, such as Donsker properties, that limit complexity of the parameter space for this object break down. To estimate η_0, we consider the use of statistical or machine learning (ML) methods which are particularly well-suited to estimation in modern, very high-dimensional cases. ML methods perform well by employing regularization to reduce variance and trading off regularization bias with overfitting in practice. However, both regularization bias and overfitting in estimating η_0 cause a heavy bias in estimators of θ_0 that are obtained by naively plugging ML estimators of η_0 into estimating equations for θ_0. This bias results in the naive estimator failing to be N^(-1/2) consistent, where N is the sample size. We show that the impact of regularization bias and overfitting on estimation of the parameter of interest θ_0 can be removed by using two simple, yet critical, ingredients: (1) using Neyman-orthogonal moments/scores that have reduced sensitivity with respect to nuisance parameters to estimate θ_0, and (2) making use of cross-fitting which provides an efficient form of data-splitting. We call the resulting set of methods double or debiased ML (DML). We verify that DML delivers point estimators that concentrate in a N^(-1/2)-neighborhood of the true parameter values and are approximately unbiased and normally distributed, which allows construction of valid confidence statements. The generic statistical theory of DML is elementary and simultaneously relies on only weak theoretical requirements which will admit the use of a broad array of modern ML methods for estimating the nuisance parameters such as random forests, lasso, ridge, deep neural nets, boosted trees, and various hybrids and ensembles of these methods. We illustrate the general theory by applying it to provide theoretical properties of DML applied to learn the main regression parameter in a partially linear regression model, DML applied to learn the coefficient on an endogenous variable in a partially linear instrumental variables model, DML applied to learn the average treatment effect and the average treatment effect on the treated under unconfoundedness, and DML applied to learn the local average treatment effect in an instrumental variables setting. In addition to these theoretical applications, we also illustrate the use of DML in three empirical examples.

[1]  P. Robinson ROOT-N-CONSISTENT SEMIPARAMETRIC REGRESSION , 1988 .

[2]  Chunrong Ai,et al.  Semiparametric Efficiency Bound for Models of Sequential Moment Restrictions Containing Unknown Functions , 2009 .

[3]  W. Newey,et al.  The influence function of semiparametric estimators , 2015, Quantitative Economics.

[4]  Halbert White,et al.  Improved Rates and Asymptotic Normality for Nonparametric Neural Network Estimators , 1999, IEEE Trans. Inf. Theory.

[5]  W. Wong,et al.  Profile Likelihood and Conditionally Parametric Models , 1992 .

[6]  J. Robins,et al.  Locally Robust Semiparametric Estimation , 2016, Econometrica.

[7]  R. Hogg,et al.  On adaptive estimation , 1984 .

[8]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[9]  A. Schick On Asymptotically Efficient Estimation in Semiparametric Models , 1986 .

[10]  M. J. van der Laan,et al.  Marginal Structural Models with Counterfactual Effect Modifiers , 2018, The international journal of biostatistics.

[11]  Victor Chernozhukov,et al.  Advances in Economics and Econometrics: Inference for High-Dimensional Sparse Econometric Models , 2013 .

[12]  J. Hahn On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .

[13]  Victor Chernozhukov,et al.  Post-Selection Inference for Generalized Linear Models With Many Controls , 2013, 1304.3969.

[14]  D. Wise,et al.  401(K) Plans and Tax-Deferred Saving , 1992 .

[15]  A. Belloni,et al.  SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN , 2012 .

[16]  Aad Van Der Vbart,et al.  ON DIFFERENTIABLE FUNCTIONALS , 1988 .

[17]  P. Bickel Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[18]  W. Newey,et al.  Semiparametric Efficiency Bounds , 1990 .

[19]  A. Belloni,et al.  Honest Confidence Regions for Logistic Regression with a Large Number of Controls , 2013 .

[20]  M. Farrell Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations , 2013, 1309.4686.

[21]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[22]  Gary Chamberlain,et al.  Efficiency Bounds for Semiparametric Regression , 1992 .

[23]  Donald W. K. Andrews,et al.  Empirical Process Methods in Econometrics , 1993 .

[24]  Stefan Wager,et al.  Adaptive Concentration of Regression Trees, with Application to Random Forests , 2015 .

[25]  D. Kozbur Testing-Based Forward Model Selection , 2015 .

[26]  A. W. van der Vaart,et al.  New statistical approaches to semiparametric regression with application to air pollution research. , 2013, Research report.

[27]  Jianqing Fan,et al.  Variance estimation using refitted cross‐validation in ultrahigh dimensional regression , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[28]  R. Koenker,et al.  Quantile regression for duration data: A reappraisal of the Pennsylvania Reemployment Bonus Experiments , 2001 .

[29]  B. Levit,et al.  On the Efficiency of a Class of Non-Parametric Estimates , 1976 .

[30]  Markus Frölich,et al.  Nonparametric IV Estimation of Local Average Treatment Effects with Covariates , 2002, SSRN Electronic Journal.

[31]  James M. Robins,et al.  MINIMAX ESTIMATION OF A FUNCTIONAL ON A STRUCTURED , 2016 .

[32]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[33]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .

[34]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[35]  Anil K. Bera,et al.  GENERAL SPECIFICATION TESTING WITH LOCALLY MISSPECIFIED MODELS , 2010, Econometric Theory.

[36]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[37]  Yannis Bilias,et al.  Sequential testing of duration data: the case of the Pennsylvania ‘reemployment bonus’ experiment , 2000 .

[38]  W. Newey,et al.  The asymptotic variance of semiparametric estimators , 1994 .

[39]  Joshua D. Angrist,et al.  Split-Sample Instrumental Variables Estimates of the Return to Schooling , 1995 .

[40]  J. Robins,et al.  Undersmoothing and bias corrected functional estimation , 1998 .

[41]  Lung-fei Lee A C(α)-type Gradient Test in the GMM Approach , 2005 .

[42]  Jeffrey M. Wooldridge,et al.  Specification testing and quasi-maximum-likelihood estimation , 1991 .

[43]  O. Linton Edgeworth Approximation for MINPIN Estimators in Semiparametric Regression Models , 1994, Econometric Theory.

[44]  G. Chamberlain Asymptotic efficiency in estimation with conditional moment restrictions , 1987 .

[45]  Mark J. van der Laan,et al.  TMLE for Marginal Structural Models Based on an Instrument , 2016 .

[46]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[47]  A. Belloni,et al.  Program evaluation with high-dimensional data , 2013 .

[48]  D. Wise,et al.  Do 401(K) Contributions Crowd Out Other Persoanl Saving? , 1993 .

[49]  Christian Hansen,et al.  The Effects of 401(K) Participation on the Wealth Distribution: An Instrumental Quantile Regression Analysis , 2004, Review of Economics and Statistics.

[50]  A. Dasgupta Asymptotic Theory of Statistics and Probability , 2008 .

[51]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009, 1001.0188.

[52]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[53]  Adel Javanmard,et al.  Hypothesis Testing in High-Dimensional Regression Under the Gaussian Random Design Model: Asymptotic Theory , 2013, IEEE Transactions on Information Theory.

[54]  Gerald S. Rogers,et al.  Mathematical Statistics: A Decision Theoretic Approach , 1967 .

[55]  Christian Hansen,et al.  Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments , 2015, 1501.03185.

[56]  Kengo Kato,et al.  Uniform post selection inference for LAD regression models , 2013 .

[57]  Prem S. Puri,et al.  On Optimal Asymptotic Tests of Composite Statistical Hypotheses , 1967 .

[58]  James M. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models: Rejoinder , 1999 .

[59]  Marc Ratkovic,et al.  Estimating treatment effect heterogeneity in randomized program evaluation , 2013, 1305.5682.

[60]  A. Belloni,et al.  L1-Penalized Quantile Regression in High Dimensional Sparse Models , 2009, 0904.2931.

[61]  A. Belloni,et al.  Inference for High-Dimensional Sparse Econometric Models , 2011, 1201.0220.

[62]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[63]  R. Z. Khasʹminskiĭ,et al.  Statistical estimation : asymptotic theory , 1981 .

[64]  Mark J. van der Laan,et al.  Cross-Validated Targeted Minimum-Loss-Based Estimation , 2011 .

[65]  Martin Spindler,et al.  High-Dimensional $L_2$Boosting: Rate of Convergence , 2016, 1602.08927.

[66]  Cun-Hui Zhang,et al.  Confidence Intervals for Low-Dimensional Parameters With High-Dimensional Data , 2011 .

[67]  M. J. van der Laan,et al.  The International Journal of Biostatistics Targeted Maximum Likelihood Learning , 2011 .

[68]  Seung C. Ahn,et al.  Panel Data Models with Multiple Time-Varying Individual Effects , 2013 .

[69]  J. Robins,et al.  Twicing Kernels and a Small Bias Property of Semiparametric Estimators , 2004 .

[70]  A. Belloni,et al.  Program evaluation and causal inference with high-dimensional data , 2013, 1311.2645.

[71]  Aad van der Vaart,et al.  Higher order influence functions and minimax estimation of nonlinear functionals , 2008, 0805.3040.

[72]  Mark J van der Laan,et al.  Optimal Individualized Treatments in Resource-Limited Settings , 2016, The international journal of biostatistics.

[73]  J. Angrist,et al.  Identification and Estimation of Local Average Treatment Effects , 1995 .

[74]  Alan E. Hubbard,et al.  Statistical Inference for Data Adaptive Target Parameters , 2016, The international journal of biostatistics.

[75]  J Mark,et al.  A Generally Efficient Targeted Minimum Loss Based Estimator , 2017 .

[76]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[77]  Donald W. K. Andrews,et al.  Asymptotics for Semiparametric Econometric Models via Stochastic Equicontinuity , 1994 .

[78]  Christian Hansen,et al.  Lasso Methods for Gaussian Instrumental Variables Models , 2010, 1012.1297.

[79]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[80]  G. Imbens,et al.  Approximate residual balancing: debiased inference of average treatment effects in high dimensions , 2016, 1604.07125.

[81]  Victor Chernozhukov,et al.  Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems , 2013, 1304.0282.

[82]  L. Hansen LARGE SAMPLE PROPERTIES OF GENERALIZED METHOD OF , 1982 .

[83]  A. Tsybakov,et al.  High-dimensional instrumental variables regression and confidence sets -- v2/2012 , 2018, 1812.11330.

[84]  Brian Kent Aldershof,et al.  Estimation of integrated squared density derivatives , 1991 .

[85]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.