Invariant Representation Learning for Treatment Effect Estimation

The defining challenge for causal inference from observational data is the presence of `confounders', covariates that affect both treatment assignment and the outcome. To address this challenge, practitioners collect and adjust for the covariates, hoping that they adequately correct for confounding. However, including every observed covariate in the adjustment runs the risk of including `bad controls', variables that \emph{induce} bias when they are conditioned on. The problem is that we do not always know which variables in the covariate set are safe to adjust for and which are not. To address this problem, we develop Nearly Invariant Causal Estimation (NICE). NICE uses invariant risk minimization (IRM) [Arj19] to learn a representation of the covariates that, under some assumptions, strips out bad controls but preserves sufficient information to adjust for confounding. Adjusting for the learned representation, rather than the covariates themselves, avoids the induced bias and provides valid causal inferences. NICE is appropriate in the following setting. i) We observe data from multiple environments that share a common causal mechanism for the outcome, but that differ in other ways. ii) In each environment, the collected covariates are a superset of the causal parents of the outcome, and contain sufficient information for causal identification. iii) But the covariates also may contain bad controls, and it is unknown which covariates are safe to adjust for and which ones induce bias. We evaluate NICE on both synthetic and semi-synthetic data. When the covariates contain unknown collider variables and other bad controls, NICE performs better than existing methods that adjust for all the covariates.

[1]  Carlos Cinelli,et al.  Making sense of sensitivity: extending omitted variable bias , 2019, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[2]  Mihaela van der Schaar,et al.  GANITE: Estimation of Individualized Treatment Effects using Generative Adversarial Nets , 2018, ICLR.

[3]  Ashkan Ertefaie,et al.  Outcome‐adaptive lasso: Variable selection for causal inference , 2017, Biometrics.

[4]  Pradeep Ravikumar,et al.  The Risks of Invariant Risk Minimization , 2020, ICLR.

[5]  Comment on “Causal inference using invariant prediction” , 2016 .

[6]  P. Spirtes,et al.  Review of Causal Discovery Methods Based on Graphical Models , 2019, Front. Genet..

[7]  D. Rubin Should observational studies be designed to allow lack of balance in covariate distributions across treatment groups? , 2009 .

[8]  I. Shpitser,et al.  A New Criterion for Confounder Selection , 2011, Biometrics.

[9]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[10]  Pearl Judea,et al.  Confounding Equivalence in Causal Inference , 2014 .

[11]  Jay Bhattacharya,et al.  Do Instrumental Variables Belong in Propensity Scores? , 2007 .

[12]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[13]  Pietro Perona,et al.  Multi-Level Cause-Effect Systems , 2015, AISTATS.

[14]  David M. Blei,et al.  Adapting Neural Networks for the Estimation of Treatment Effects , 2019, NeurIPS.

[15]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[16]  Kun Zhang,et al.  Domain Adaptation As a Problem of Inference on Graphical Models , 2020, NeurIPS.

[17]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[18]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[19]  Kevin Leyton-Brown,et al.  Deep IV: A Flexible Approach for Counterfactual Prediction , 2017, ICML.

[20]  P. Bühlmann,et al.  Invariance, Causality and Robustness , 2018, Statistical Science.

[21]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[22]  G. King,et al.  Improving Quantitative Studies of International Conflict: A Conjecture , 2000, American Political Science Review.

[23]  Adam Glynn,et al.  An Introduction to the Augmented Inverse Propensity Weighted Estimator , 2010, Political Analysis.

[24]  Marie Davidian,et al.  Doubly robust estimation of causal effects. , 2011, American journal of epidemiology.

[25]  N. Jewell,et al.  Some surprising results about covariate adjustment in logistic regression models , 1991 .

[26]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[27]  Bernhard Schölkopf,et al.  Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[28]  M J van der Laan,et al.  Covariate adjustment in randomized trials with binary outcomes: Targeted maximum likelihood estimation , 2009, Statistics in medicine.

[29]  P. Alam ‘S’ , 2021, Composites Engineering: An A–Z Guide.

[30]  Judea Pearl,et al.  Confounding Equivalence in Causal Inference , 2010, UAI.

[31]  Christina Heinze-Deml,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.

[32]  Elizabeth L. Ogburn,et al.  The Magnitude and Direction of Collider Bias for Binary Variables , 2016, Epidemiologic Methods.

[33]  Paul R. Rosenbaum,et al.  Overt Bias in Observational Studies , 2002 .

[34]  J. Robins,et al.  Instrumental variables as bias amplifiers with general outcome and confounding , 2017, Biometrika.

[35]  Elias Bareinboim,et al.  A Calculus for Stochastic Interventions: Causal Effect Identification and Surrogate Experiments , 2020, AAAI.

[36]  P. Austin An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies , 2011, Multivariate behavioral research.

[37]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[38]  Jimeng Sun,et al.  Causal Regularization , 2019, NeurIPS.

[39]  Emir Kamenica,et al.  Gender Differences in Mate Selection: Evidence From a Speed Dating Experiment , 2006 .

[40]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[41]  J. Robins A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect , 1986 .

[42]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[43]  Alexander D'Amour,et al.  Overlap in observational studies with high-dimensional covariates , 2017, Journal of Econometrics.

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Kevin Murphy,et al.  Modelling Gene Expression Data using Dynamic Bayesian Networks , 2006 .

[46]  J. Peters,et al.  Invariant Causal Prediction for Sequential Data , 2017, Journal of the American Statistical Association.

[47]  Joris M. Mooij,et al.  Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[48]  Pietro Perona,et al.  Visual Causal Feature Learning , 2014, UAI.

[49]  T. Haavelmo The Statistical Implications of a System of Simultaneous Equations , 1943 .

[50]  Richard Scheines,et al.  Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data , 2000 .