SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

We study the problem of inferring heterogeneous treatment effects from time-toevent data. While both the related problems of (i) estimating treatment effects for binary or continuous outcomes and (ii) predicting survival outcomes have been well studied in the recent machine learning literature, their combination – albeit of high practical relevance – has received considerably less attention. With the ultimate goal of reliably estimating the effects of treatments on instantaneous risk and survival probabilities, we focus on the problem of learning (discrete-time) treatment-specific conditional hazard functions. We find that unique challenges arise in this context due to a variety of covariate shift issues that go beyond a mere combination of wellstudied confounding and censoring biases. We theoretically analyse their effects by adapting recent generalization bounds from domain adaptation and treatment effect estimation to our setting and discuss implications for model design. We use the resulting insights to propose a novel deep learning method for treatment-specific hazard estimation based on balancing representations. We investigate performance across a range of experimental settings and empirically confirm that our method outperforms baselines by addressing covariate shifts from various sources.

[1]  Fan Li,et al.  Estimating heterogeneous survival treatment effect in observational data using machine learning. , 2020 .

[2]  D Faraggi,et al.  A neural network model for survival data. , 1995, Statistics in medicine.

[3]  E Biganzoli,et al.  Feed forward neural networks for the analysis of censored survival data: a partial logistic regression approach. , 1998, Statistics in medicine.

[4]  Wouter M. Kouw An introduction to domain adaptation and transfer learning , 2018, ArXiv.

[5]  Zachary C. Lipton,et al.  What is the Effect of Importance Weighting in Deep Learning? , 2018, ICML.

[6]  Sören R. Künzel,et al.  Metalearners for estimating heterogeneous treatment effects using machine learning , 2017, Proceedings of the National Academy of Sciences.

[7]  Negar Hassanpour,et al.  Learning Disentangled Representations for CounterFactual Regression , 2020, ICLR.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[10]  Robert L. Strawderman,et al.  Censoring Unbiased Regression Trees and Ensembles , 2018, Journal of the American Statistical Association.

[11]  Yishay Mansour,et al.  Learning Bounds for Importance Weighting , 2010, NIPS.

[12]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[13]  Yuting Ye,et al.  Understanding the role of importance weighting for deep learning , 2021, ICLR.

[14]  Changhee Lee,et al.  DeepHit: A Deep Learning Approach to Survival Analysis With Competing Risks , 2018, AAAI.

[15]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[16]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[17]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[18]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[19]  Jon Arni Steingrimsson,et al.  Deep learning for survival outcomes , 2019, Statistics in medicine.

[20]  Denis Larocque,et al.  Non-parametric individual treatment effect estimation for survival data with random forests , 2019, Bioinform..

[21]  Mark J van der Laan,et al.  Targeted Maximum Likelihood Estimation of Effect Modification Parameters in Survival Analysis , 2011, The international journal of biostatistics.

[22]  Ørnulf Borgan,et al.  Continuous and discrete-time survival prediction with neural networks , 2019, Lifetime Data Analysis.

[23]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[24]  David M. Blei,et al.  Adapting Neural Networks for the Estimation of Treatment Effects , 2019, NeurIPS.

[25]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[26]  M. J. van der Laan,et al.  One‐step targeted maximum likelihood estimation for time‐to‐event outcomes , 2019, Biometrics.

[27]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[28]  Negar Hassanpour,et al.  CounterFactual Regression with Importance Sampling Weights , 2019, IJCAI.

[29]  Russell Greiner,et al.  Learning Patient-Specific Cancer Survival Distributions as a Sequence of Dependent Regressors , 2011, NIPS.

[30]  P. Holland Statistics and Causal Inference , 1985 .

[31]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[32]  Fredrik D. Johansson,et al.  Learning Weighted Representations for Generalization Across Designs , 2018, 1802.08598.

[33]  Edward H. Kennedy Optimal doubly robust estimation of heterogeneous causal effects , 2020, 2004.14497.

[34]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[35]  Mark J van der Laan,et al.  The International Journal of Biostatistics Collaborative Targeted Maximum Likelihood for Time to Event Data , 2011 .

[36]  M. Pencina,et al.  On the C‐statistics for evaluating overall adequacy of risk prediction procedures with censored survival data , 2011, Statistics in medicine.

[37]  Lawrence Carin,et al.  Enabling counterfactual survival analysis with balanced representations , 2021, CHIL.

[38]  I. Díaz,et al.  Targeted learning ensembles for optimal individualized treatment rules with time-to-event outcomes. , 2017, Biometrika.

[39]  Mihaela van der Schaar,et al.  GANITE: Estimation of Individualized Treatment Effects using Generative Adversarial Nets , 2018, ICLR.

[40]  Rajesh Ranganath,et al.  Support and Invertibility in Domain-Invariant Representations , 2019, AISTATS.

[41]  Nigam H. Shah,et al.  Countdown Regression: Sharp and Calibrated Survival Predictions , 2018, UAI.

[42]  Lawrence Carin,et al.  Counterfactual Representation Learning with Balancing Weights , 2021, AISTATS.

[43]  M. Kosorok,et al.  Estimating heterogeneous treatment effects with right-censored data via causal survival forests , 2020, Journal of the Royal Statistical Society Series B: Statistical Methodology.

[44]  Jared S. Murray,et al.  Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with Discussion) , 2020, 2108.02836.

[45]  Thomas A Louis,et al.  Individualized treatment effects with censored data via fully nonparametric Bayesian accelerated failure time models. , 2017, Biostatistics.

[46]  Gerhard Tutz,et al.  Modeling Discrete Time-To-Event Data , 2016 .

[47]  Yoshua Bengio,et al.  Deep Learning for Patient-Specific Kidney Graft Survival Analysis , 2017, ArXiv.

[48]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[49]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[50]  Mihaela van der Schaar,et al.  Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes , 2017, NIPS.

[51]  Adler J. Perotte,et al.  Deep Survival Analysis , 2016, MLHC.

[52]  Mihaela van der Schaar,et al.  Limits of Estimating Heterogeneous Treatment Effects: Guidelines for Practical Algorithm Design , 2018, ICML.

[53]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[54]  Uri Shalit,et al.  Generalization Bounds and Representation Learning for Estimation of Potential Outcomes and Causal Effects , 2020, ArXiv.

[55]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[56]  Lawrence Carin,et al.  Adversarial Time-to-Event Modeling , 2018, ICML.

[57]  Jun Yan Survival Analysis: Techniques for Censored and Truncated Data , 2004 .

[58]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[59]  Lei Zheng,et al.  Deep Recurrent Survival Analysis , 2018, AAAI.

[60]  Sebastian Pölsterl,et al.  scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn , 2020, J. Mach. Learn. Res..

[61]  Mihaela van der Schaar,et al.  Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory to Learning Algorithms , 2021, AISTATS.

[62]  C. Brown On the use of indicator variables for studying the time-dependence of parameters in a response-time model. , 1975, Biometrics.

[63]  D. Almond,et al.  The Costs of Low Birth Weight , 2004 .

[64]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[65]  Xinkun Nie,et al.  Quasi-oracle estimation of heterogeneous treatment effects , 2017, Biometrika.

[66]  Balasubramanian Narasimhan,et al.  A scalable discrete-time survival model for neural networks , 2018, PeerJ.

[67]  Zhi-Hua Zhou,et al.  Mining heterogeneous causal effects for personalized cancer treatment , 2017, Bioinform..