Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge

Treatment effect estimation from observational data is a fundamental problem in causal inference. There are two very different schools of thought that have tackled this problem. On the one hand, the Pearlian framework commonly assumes structural knowledge (provided by an expert) in the form of Directed Acyclic Graphs (DAGs) and provides graphical criteria such as the back-door criterion to identify the valid adjustment sets. On the other hand, the potential outcomes (PO) framework commonly assumes that all the observed features satisfy ignorability (i.e., no hidden confounding), which in general is untestable. In this work, we take steps to bridge these two frameworks. We show that even if we know only one parent of the treatment variable (provided by an expert), then quite remarkably it suffices to test a broad class of (but not all) back-door criteria. Importantly, we also cover the non-trivial case where the entire set of observed features is not ignorable (generalizing the PO framework) without requiring all the parents of the treatment variable to be observed. Our key technical idea involves a more general result — Given a synthetic sub-sampling (or environment) variable that is a function of the parent variable, we show that an invariance test involving this sub-sampling variable is equivalent to testing a broad class of back-door criteria. We demonstrate our approach on synthetic data as well as real causal effect estimation benchmarks.

[1]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[2]  D. Almond,et al.  The Costs of Low Birth Weight , 2004 .

[3]  Elias Bareinboim,et al.  Local Characterizations of Causal Bayesian Networks , 2011, GKR.

[4]  Sören R. Künzel,et al.  Metalearners for estimating heterogeneous treatment effects using machine learning , 2017, Proceedings of the National Academy of Sciences.

[5]  Peter Buhlmann,et al.  Geometry of the faithfulness assumption in causal inference , 2012, 1207.0547.

[6]  J. Pearl Comment: Understanding Simpson’s Paradox , 2013, Probabilistic and Causal Inference.

[7]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[8]  Patrik O. Hoyer,et al.  Data-driven covariate selection for nonparametric estimation of causal effects , 2013, AISTATS.

[9]  D. Rubin,et al.  Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score , 1985 .

[10]  David Blei,et al.  Invariant Representation Learning for Treatment Effect Estimation , 2020, UAI.

[11]  Jeffrey A. Smith,et al.  Does Matching Overcome Lalonde's Critique of Nonexperimental Estimators? , 2000 .

[12]  Matt J. Kusner,et al.  Differentiable Causal Backdoor Discovery , 2020, AISTATS.

[13]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[14]  Eric V. Strobl,et al.  Approximate Kernel-Based Conditional Independence Tests for Fast Non-Parametric Causal Discovery , 2017, Journal of Causal Inference.

[15]  Paul R. Rosenbaum,et al.  Optimal Matching for Observational Studies , 1989 .

[16]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[17]  David M. Blei,et al.  Adapting Neural Networks for the Estimation of Treatment Effects , 2019, NeurIPS.

[18]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[19]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[20]  S. Schneeweiss,et al.  Practice of Epidemiology Implications of M Bias in Epidemiologic Studies: a Simulation Study , 2022 .

[21]  J. Pearl,et al.  Causal Inference in Statistics: A Primer , 2016 .

[22]  Guido W. Imbens,et al.  Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics , 2019, Journal of Economic Literature.

[23]  Judea Pearl,et al.  Complete Identification Methods for the Causal Hierarchy , 2008, J. Mach. Learn. Res..

[24]  Matias D. Cattaneo,et al.  Efficient semiparametric estimation of multi-valued treatment effects under ignorability , 2010 .

[25]  John Langford,et al.  Off-policy evaluation for slate recommendation , 2016, NIPS.

[26]  Mihaela van der Schaar,et al.  GANITE: Estimation of Individualized Treatment Effects using Generative Adversarial Nets , 2018, ICLR.

[27]  Jin Tian,et al.  A general identification condition for causal effects , 2002, AAAI/IAAI.

[28]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[29]  Jiuyong Li,et al.  Toward Unique and Unbiased Causal Effect Estimation From Data With Hidden Variables , 2022, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Nathan Kallus,et al.  DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training , 2018, ICML.

[31]  Johannes Textor,et al.  Complete Graphical Characterization and Construction of Adjustment Sets in Markov Equivalence Classes of Ancestral Graphs , 2016, J. Mach. Learn. Res..

[32]  G. Imbens,et al.  Implementing Matching Estimators for Average Treatment Effects in Stata , 2004 .

[33]  J. Pearl Causal diagrams for empirical research , 1995 .

[34]  Jiji Zhang,et al.  Causal Reasoning with Ancestral Graphs , 2008, J. Mach. Learn. Res..

[35]  Joris M. Mooij,et al.  Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[36]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[37]  D. Rubin Matched Sampling for Causal Effects: Matching to Remove Bias in Observational Studies , 1973 .

[38]  Mihaela van der Schaar,et al.  Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes , 2017, NIPS.

[39]  Nathan Srebro,et al.  Does Invariant Risk Minimization Capture Invariance? , 2021, ArXiv.

[40]  Bernhard Scholkopf Causality for Machine Learning , 2019 .

[41]  J. Pearl [Bayesian Analysis in Expert Systems]: Comment: Graphical Models, Causality and Intervention , 1993 .

[42]  Marie Davidian,et al.  Doubly robust estimation of causal effects. , 2011, American journal of epidemiology.

[43]  Christina Heinze-Deml,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.

[44]  Donald B. Rubin,et al.  Rubin Causal Model , 2011, International Encyclopedia of Statistical Science.

[45]  I. Shpitser,et al.  A New Criterion for Confounder Selection , 2011, Biometrics.

[46]  Suchi Saria,et al.  Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport , 2018, AISTATS.

[47]  Peter Bak,et al.  An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference , 2019, ArXiv.

[48]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[49]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[50]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[51]  Ruocheng Guo,et al.  Out-of-distribution Prediction with Invariant Risk Minimization: The Limitation and An Effective Fix , 2021, ArXiv.