High-Dimensional Feature Selection for Sample Efficient Treatment Effect Estimation

The estimation of causal treatment effects from observational data is a fundamental problem in causal inference. To avoid bias, the effect estimator must control for all confounders. Hence practitioners often collect data for as many covariates as possible to raise the chances of including the relevant confounders. While this addresses the bias, this has the side effect of significantly increasing the number of data samples required to accurately estimate the effect due to the increased dimensionality. In this work, we consider the setting where out of a large number of covariates $X$ that satisfy strong ignorability, an unknown sparse subset $S$ is sufficient to include to achieve zero bias, i.e. $c$-equivalent to $X$. We propose a common objective function involving outcomes across treatment cohorts with nonconvex joint sparsity regularization that is guaranteed to recover $S$ with high probability under a linear outcome model for $Y$ and subgaussian covariates for each of the treatment cohort. This improves the effect estimation sample complexity so that it scales with the cardinality of the sparse subset $S$ and $\log |X|$, as opposed to the cardinality of the full set $X$. We validate our approach with experiments on treatment effect estimation.

[1]  S. Geer,et al.  Oracle Inequalities and Optimal Inference under Group Sparsity , 2010, 1007.1771.

[2]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[3]  Mihaela van der Schaar,et al.  GANITE: Estimation of Individualized Treatment Effects using Generative Adversarial Nets , 2018, ICLR.

[4]  Ashkan Ertefaie,et al.  Outcome‐adaptive lasso: Variable selection for causal inference , 2017, Biometrics.

[5]  Xiaojie Mao,et al.  Interval Estimation of Individual-Level Causal Effects Under Unobserved Confounding , 2018, AISTATS.

[6]  Soeren Reinhold Kuenzel,et al.  Heterogeneous Treatment Effect Estimation Using Machine Learning , 2019 .

[7]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[8]  R. Vershynin How Close is the Sample Covariance Matrix to the Actual Covariance Matrix? , 2010, 1004.3484.

[9]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[10]  John Duchi,et al.  BOUNDS ON THE CONDITIONAL AND AVERAGE TREATMENT EFFECT WITH UNOBSERVED CONFOUNDING FACTORS. , 2018, Annals of statistics.

[11]  Ruocheng Guo,et al.  A Survey of Learning Causality with Data , 2018, ACM Comput. Surv..

[12]  Aidong Zhang,et al.  Representation Learning for Treatment Effect Estimation from Observational Data , 2018, NeurIPS.

[13]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..

[14]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[15]  Lin Liu,et al.  Sufficient dimension reduction for average causal effect estimation , 2020, Data Mining and Knowledge Discovery.

[16]  Judea Pearl,et al.  Identification of Conditional Interventional Distributions , 2006, UAI.

[17]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .

[18]  Nathan Kallus,et al.  DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training , 2018, ICML.

[19]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[20]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[21]  Junzhou Huang,et al.  The Benefit of Group Sparsity , 2009 .

[22]  Peter Bak,et al.  An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference , 2019, ArXiv.

[23]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[24]  G. Imbens,et al.  Machine Learning Methods for Estimating Heterogeneous Causal Eects , 2015 .

[25]  Po-Ling Loh,et al.  Support recovery without incoherence: A case for nonconvex regularization , 2014, ArXiv.