Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond

We consider the efficient estimation of a low-dimensional parameter in an estimating equation involving high-dimensional nuisances that depend on the parameter of interest. An important example is the (local) quantile treatment effect ((L)QTE) in causal inference, for which the efficient estimating equation involves as a nuisance the covariate-conditional cumulative distribution function evaluated at the quantile to be estimated. Debiased machine learning (DML) is a data-splitting approach to address the need to estimate nuisances using flexible machine learning methods that may not satisfy strong metric entropy conditions, but applying it to problems with parameter-dependent nuisances is impractical. For (L)QTE estimation, DML requires we learn the whole conditional cumulative distribution function, conditioned on potentially high-dimensional covariates, which is far more challenging than the standard supervised regression task in machine learning. We instead propose localized debiased machine learning (LDML), a new data-splitting approach that avoids this burdensome step and needs only estimate the nuisances at a single initial rough guess for the parameter. For (L)QTE estimation, this involves just learning two binary regression (i.e., classification) models, for which many standard, time-tested machine learning methods exist, and the initial rough guess may be given by inverse propensity weighting. We prove that under lax rate conditions on nuisances, our estimator has the same favorable asymptotic behavior as the infeasible oracle estimator that solves the estimating equation with the unknown true nuisance functions. Thus, our proposed approach uniquely enables practically-feasible and theoretically-grounded efficient estimation of important quantities in causal inference such as (L)QTEs and in other coarsened data settings.

[1]  J. Robins,et al.  Locally Robust Semiparametric Estimation , 2016, Econometrica.

[2]  Dimitris Bertsimas,et al.  From Predictive to Prescriptive Analytics , 2014, Manag. Sci..

[3]  James M. Robins,et al.  Double/De-Biased Machine Learning of Global and Local Parameters Using Regularized Riesz Representers , 2018 .

[4]  Zhiwei Steven Wu,et al.  Orthogonal Random Forest for Causal Inference , 2018, ICML.

[5]  Vasilis Syrgkanis,et al.  Regularized Orthogonal Machine Learning for Nonlinear Semiparametric Models , 2018 .

[6]  J. Robins,et al.  Undersmoothing and bias corrected functional estimation , 1998 .

[7]  Masatoshi Uehara,et al.  Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes , 2019, J. Mach. Learn. Res..

[8]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[9]  I. Díaz Efficient Estimation of Quantiles in Missing Data Models , 2015, 1512.08110.

[10]  Xiaojie Mao,et al.  Assessing algorithmic fairness with unobserved protected class using data combination , 2019, FAT*.

[11]  Markus Frölich,et al.  Unconditional Quantile Treatment Effects Under Endogeneity , 2013 .

[12]  A. V. D. Vaart,et al.  On Differentiable Functionals , 1991 .

[13]  Alberto Abadie Semiparametric instrumental variable estimation of treatment response models , 2003 .

[14]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[15]  Victor Chernozhukov,et al.  Post-Selection Inference for Generalized Linear Models With Many Controls , 2013, 1304.3969.

[16]  W. Newey,et al.  The asymptotic variance of semiparametric estimators , 1994 .

[17]  Christian Hansen,et al.  Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach , 2015 .

[18]  Stefan Wager,et al.  Efficient Policy Learning , 2017, ArXiv.

[19]  Xiaohong Chen,et al.  Estimation of Semiparametric Models When the Criterion Function is Not Smooth , 2002 .

[20]  Mark J. van der Laan,et al.  Cross-Validated Targeted Minimum-Loss-Based Estimation , 2011 .

[21]  Xinkun Nie,et al.  Learning Objectives for Treatment Effect Estimation , 2017 .

[22]  R. Z. Khasʹminskiĭ,et al.  Statistical estimation : asymptotic theory , 1981 .

[23]  Rahul Singh,et al.  De-biased Machine Learning in Instrumental Variable Models for Treatment Effects , 2020 .

[24]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[25]  Sanjog Misra,et al.  Deep Neural Networks for Estimation and Inference , 2018, Econometrica.

[26]  J. Angrist,et al.  Identification and Estimation of Local Average Treatment Effects , 1995 .

[27]  W. Newey,et al.  Minimax semiparametric learning with approximate sparsity , 2019, 1912.12213.

[28]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[29]  Chris A. J. Klaassen,et al.  Consistent Estimation of the Influence Function of Locally Asymptotically Linear Estimators , 1987 .

[30]  Prem S. Puri,et al.  On Optimal Asymptotic Tests of Composite Statistical Hypotheses , 1967 .

[31]  P. Robinson ROOT-N-CONSISTENT SEMIPARAMETRIC REGRESSION , 1988 .

[32]  John Duchi,et al.  BOUNDS ON THE CONDITIONAL AND AVERAGE TREATMENT EFFECT WITH UNOBSERVED CONFOUNDING FACTORS. , 2018, Annals of statistics.

[33]  Vasilis Syrgkanis,et al.  Orthogonal Statistical Learning , 2019, The Annals of Statistics.

[34]  Christian Hansen,et al.  The Effects of 401(K) Participation on the Wealth Distribution: An Instrumental Quantile Regression Analysis , 2004, Review of Economics and Statistics.

[35]  Xinkun Nie,et al.  Quasi-oracle estimation of heterogeneous treatment effects , 2017, Biometrika.

[36]  Edward H. Kennedy,et al.  Sensitivity Analysis via the Proportion of Unmeasured Confounding , 2019, 1912.02793.

[37]  Xiaojie Mao,et al.  On the role of surrogates in the efficient estimation of treatment effects with limited outcome data , 2020, ArXiv.

[38]  P. Bickel On Adaptive Estimation , 1982 .

[39]  O. Linton Edgeworth Approximation for MINPIN Estimators in Semiparametric Regression Models , 1994, Econometric Theory.

[40]  Robert P. Lieli,et al.  Estimation of Conditional Average Treatment Effects With High-Dimensional Data , 2019, Journal of Business & Economic Statistics.

[41]  Stefan Wager,et al.  Sparsity Double Robust Inference of Average Treatment Effects , 2019, 1905.00744.

[42]  W. Newey,et al.  Asymmetric Least Squares Estimation and Testing , 1987 .

[43]  Neng-Chieh Chang Double/debiased machine learning for difference-in-differences models , 2020, The Econometrics Journal.

[44]  Stanislav Uryasev,et al.  Conditional Value-at-Risk for General Loss Distributions , 2002 .

[45]  Vira Semenova,et al.  Machine Learning for Set-Identified Linear Models , 2017 .

[46]  Christian Hansen,et al.  High-dimensional econometrics and regularized GMM , 2018, 1806.01888.

[47]  Ying-Ying Lee,et al.  Double debiased machine learning nonparametric inference with continuous treatments , 2019 .

[48]  W. Newey,et al.  Semiparametric Efficiency Bounds , 1990 .

[49]  A. Belloni,et al.  Program evaluation and causal inference with high-dimensional data , 2013, 1311.2645.

[50]  Han Hong,et al.  Measurement Error Models with Auxiliary Data , 2005 .

[51]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[52]  Rahul Singh,et al.  De-biased Machine Learning for Compliers , 2019, ArXiv.

[53]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[54]  Chunrong Ai,et al.  Semiparametric Efficiency Bound for Models of Sequential Moment Restrictions Containing Unknown Functions , 2009 .

[55]  D. Poirier,et al.  On the Estimation of Production Frontiers: Maximum Likelihood Estimation of the Parameters of a Discontinuous Density Function , 1976 .

[56]  Christian Hansen,et al.  High-Dimensional Methods and Inference on Structural and Treatment Effects , 2013 .

[57]  Stefan Wager,et al.  Robust Nonparametric Difference-in-Differences Estimation , 2019 .

[58]  M. Farrell Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations , 2015 .

[59]  A. Schick On Asymptotically Efficient Estimation in Semiparametric Models , 1986 .

[60]  Maximilian Kasy,et al.  Uniformity and the Delta Method , 2015, Journal of Econometric Methods.

[61]  B. Levit,et al.  On the Efficiency of a Class of Non-Parametric Estimates , 1976 .

[62]  Jianqing Fan,et al.  Variance estimation using refitted cross‐validation in ultrahigh dimensional regression , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[63]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[64]  Vasilis Syrgkanis,et al.  Plug-in Regularized Estimation of High-Dimensional Parameters in Nonlinear Semiparametric Models , 2018, ArXiv.

[65]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[66]  Mihaela van der Schaar,et al.  Semiparametric Estimation and Inference on Structural Target Functions using Machine Learning and Influence Functions , 2020 .

[67]  Victor Chernozhukov,et al.  Debiased machine learning of conditional average treatment effects and other causal functions , 2017 .

[68]  Sergio Firpo Efficient Semiparametric Estimation of Quantile Treatment Effects , 2004 .

[69]  James M. Robins,et al.  MINIMAX ESTIMATION OF A FUNCTIONAL ON A STRUCTURED , 2016 .

[70]  Michael Zimmert Difference-in-Differences Estimation with High-Dimensional Common Trend Confounding , 2018, 1809.01643.

[71]  Victor Chernozhukov,et al.  Learning L2 Continuous Regression Functionals via Regularized Riesz Representers , 2018 .

[72]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[73]  Donald W. K. Andrews,et al.  Asymptotics for Semiparametric Econometric Models via Stochastic Equicontinuity , 1994 .

[74]  Edward H. Kennedy Towards optimal doubly robust estimation of heterogeneous causal effects , 2020, 2004.14497.

[75]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[76]  Vasilis Syrgkanis,et al.  Semi-Parametric Efficient Policy Learning with Continuous Actions , 2019, NeurIPS.

[77]  D. Tasche,et al.  On the coherence of expected shortfall , 2001, cond-mat/0104295.

[78]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[79]  J. Robins,et al.  Double/de-biased machine learning using regularized Riesz representers , 2018 .

[80]  Zhengyuan Zhou,et al.  Offline Multi-Action Policy Learning: Generalization and Optimization , 2018, Oper. Res..

[81]  Han Liu,et al.  A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models , 2014, 1412.8765.

[82]  Victor Chernozhukov,et al.  Pivotal estimation via square-root Lasso in nonparametric regression , 2014 .