Deconfounding Temporal Autoencoder: Estimating Treatment Effects over Time Using Noisy Proxies

Estimating individualized treatment effects (ITEs) from observational data is crucial for decision-making. In order to obtain unbiased ITE estimates, a common assumption is that all confounders are observed. However, in practice, it is unlikely that we observe these confounders directly. Instead, we often observe noisy measurements of true confounders, which can serve as valid proxies. In this paper, we address the problem of estimating ITE in the longitudinal setting where we observe noisy proxies instead of true confounders. To this end, we develop the Deconfounding Temporal Autoencoder (DTA), a novel method that leverages observed noisy proxies to learn a hidden embedding that reflects the true hidden confounders. In particular, the DTA combines a long shortterm memory autoencoder with a causal regularization penalty that renders the potential outcomes and treatment assignment conditionally independent given the learned hidden embedding. Once the hidden embedding is learned via DTA, state-of-the-art outcome models can be used to control for it and obtain unbiased estimates of ITE. Using synthetic and real-world medical data, we demonstrate the effectiveness of our DTA by improving over state-of-the-art benchmarks by a substantial margin.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  S. Goodman,et al.  Causal inference in public health. , 2013, Annual review of public health.

[3]  Romain Neugebauer,et al.  Targeted learning with daily EHR data , 2017, Statistics in medicine.

[4]  Madeleine Udell,et al.  Causal Inference with Noisy and Missing Covariates via Matrix Factorization , 2018, NeurIPS.

[5]  Mihaela van der Schaar,et al.  Time Series Deconfounder: Estimating Treatment Effects over Time in the Presence of Hidden Confounders , 2019, ICML.

[6]  M Schomaker,et al.  Using longitudinal targeted maximum likelihood estimation in complex settings with dynamic interventions , 2018, Statistics in medicine.

[7]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[8]  Suchi Saria,et al.  Reliable Decision Support using Counterfactual Models , 2017, NIPS.

[9]  Bryan Lim,et al.  Forecasting Treatment Responses Over Time Using Recurrent Marginal Structural Networks , 2018, NeurIPS.

[10]  Mihaela van der Schaar,et al.  Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes , 2017, NIPS.

[11]  Mihaela van der Schaar,et al.  Estimating Counterfactual Treatment Outcomes over Time Through Adversarially Balanced Representations , 2020, ICLR.

[12]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[13]  Stefan Feuerriegel,et al.  Estimating Average Treatment Effects via Orthogonal Regularization , 2021, CIKM.

[14]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[15]  Mihaela van der Schaar,et al.  On Inductive Biases for Heterogeneous Treatment Effect Estimation , 2021, NeurIPS.

[16]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[17]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[18]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[19]  Stefan Feuerriegel,et al.  Generalizing Off-Policy Learning under Sample Selection Bias , 2021, UAI.

[20]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[21]  M. Robins James,et al.  Estimation of the causal effects of time-varying exposures , 2008 .

[22]  Stefan Feuerriegel,et al.  Sequential Deconfounding for Causal Inference with Unobserved Confounders , 2021, ArXiv.

[23]  Fredrik D. Johansson,et al.  Guidelines for reinforcement learning in healthcare , 2019, Nature Medicine.

[24]  Suchi Saria,et al.  A Non-parametric Bayesian Approach for Estimating Treatment-Response Curves from Sparse Time Series , 2016, MLHC.

[25]  Michael J Daniels,et al.  A Bayesian nonparametric approach to marginal structural models for point treatments and a continuous or survival outcome. , 2017, Biostatistics.

[26]  S. Feuerriegel,et al.  Analyzing Patient Trajectories With Artificial Intelligence , 2021, Journal of medical Internet research.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Stefan Feuerriegel,et al.  Early Detection of User Exits from Clickstream Data: A Markov Modulated Marked Point Process Model , 2020, WWW.

[29]  Zirui Song,et al.  Effect of a Workplace Wellness Program on Employee Health and Economic Outcomes: A Randomized Clinical Trial , 2019, JAMA.

[30]  David M. Blei,et al.  The Blessings of Multiple Causes , 2018, Journal of the American Statistical Association.

[31]  Marzyeh Ghassemi,et al.  MIMIC-Extract: a data extraction, preprocessing, and representation pipeline for MIMIC-III , 2019, CHIL.

[32]  Jeroen Berrevoets,et al.  OrganITE: Optimal transplant donor organ offering using an individual treatment effect , 2020, NeurIPS.

[33]  Ruocheng Guo,et al.  Deconfounding with Networked Observational Data in a Dynamic Environment , 2021, WSDM.