Identifying Causal Effect Inference Failure with Uncertainty-Aware Models

Recommending the best course of action for an individual is a major application of individual-level causal effect estimation. This application is often needed in safety-critical domains such as healthcare, where estimating and communicating uncertainty to decision-makers is crucial. We introduce a practical approach for integrating uncertainty estimation into a class of state-of-the-art neural network methods used for individual-level causal estimates. We show that our methods enable us to deal gracefully with situations of "no-overlap", common in high-dimensional data, where standard applications of causal effect approaches fail. Further, our methods allow us to handle covariate shift, where test distribution differs to train distribution, common when systems are deployed in practice. We show that when such a covariate shift occurs, correctly modeling uncertainty can keep us from giving overconfident and potentially harmful recommendations. We demonstrate our methodology with a range of state-of-the-art models. Under both covariate shift and lack of overlap, our uncertainty-equipped methods can alert decisions makers when predictions are not to be trusted while outperforming their uncertainty-oblivious counterparts.

[1]  Jon M. Kleinberg,et al.  The Algorithmic Automation Problem: Prediction, Triage, and Human Effort , 2019, ArXiv.

[2]  Mihaela van der Schaar,et al.  Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes , 2017, NIPS.

[3]  Jennifer Hill,et al.  Automated versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition , 2017, Statistical Science.

[4]  Guodong Zhang,et al.  Functional Variational Bayesian Neural Networks , 2019, ICLR.

[5]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[6]  Dylan S. Small,et al.  Discrete Optimization for Interpretable Study Populations and Randomization Inference in an Observational Study of Severe Sepsis Mortality , 2014, 1411.4873.

[7]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[10]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[11]  Thorsten Joachims,et al.  Recommendations as Treatments: Debiasing Learning and Evaluation , 2016, ICML.

[12]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[13]  A. Gelman,et al.  Causal inference using regression on the treatment variable , 2020, Regression and Other Stories.

[14]  Thomas J. Walsh,et al.  Knows what it knows: a framework for self-aware learning , 2008, ICML '08.

[15]  David M. Blei,et al.  Adapting Neural Networks for the Estimation of Treatment Effects , 2019, NeurIPS.

[16]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[17]  Roberto Cipolla,et al.  Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning , 2017, IJCAI.

[18]  I. Light The Collaborative Perinatal Study of the National Institute of Neurological Diseases and Stroke: The Women and Their Pregnancies. , 1973 .

[19]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[20]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[21]  Jos'e Miguel Hern'andez-Lobato,et al.  Bayesian Variational Autoencoders for Unsupervised Out-of-Distribution Detection , 2019, ArXiv.

[22]  Thomas G. Dietterich,et al.  Deep Anomaly Detection with Outlier Exposure , 2018, ICLR.

[23]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[24]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[25]  Richard K. Crump,et al.  Dealing with limited overlap in estimation of average treatment effects , 2009 .

[26]  Marco Caliendo,et al.  Some Practical Guidance for the Implementation of Propensity Score Matching , 2005, SSRN Electronic Journal.

[27]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[28]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[29]  Chen Yanover,et al.  Benchmarking Framework for Performance-Evaluation of Causal Inference Analysis , 2018, ArXiv.

[30]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[31]  Judea Pearl,et al.  Causal Inference , 2010 .

[32]  Víctor M. Pérez-García,et al.  Towards Uncertainty-Assisted Brain Tumor Segmentation and Survival Prediction , 2017, BrainLes@MICCAI.

[33]  Mihaela van der Schaar,et al.  Deep-Treat: Learning Optimal Personalized Treatments From Observational Data Using Neural Networks , 2018, AAAI.

[34]  Siegfried Wahl,et al.  Leveraging uncertainty information from deep neural networks for disease detection , 2016, Scientific Reports.

[35]  Kenneth R. Niswander,et al.  The Women and their Pregnancies: The Collaborative Perinatal Study of the National Institute of Neurological Diseases and Stroke , 1972 .

[36]  H. Chipman,et al.  Bayesian Additive Regression Trees , 2006 .

[37]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[38]  Richard A. Nielsen,et al.  Why Propensity Scores Should Not Be Used for Matching , 2019, Political Analysis.

[39]  Alexander D'Amour,et al.  Overlap in observational studies with high-dimensional covariates , 2017, Journal of Econometrics.

[40]  Dustin Tran,et al.  Bayesian Layers: A Module for Neural Network Uncertainty , 2018, NeurIPS.

[41]  R. D'Agostino Adjustment Methods: Propensity Score Methods for Bias Reduction in the Comparison of a Treatment to a Non‐Randomized Control Group , 2005 .

[42]  Zoubin Ghahramani,et al.  Bayesian Active Learning for Classification and Preference Learning , 2011, ArXiv.

[43]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[44]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Yarin Gal,et al.  A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks , 2019, ArXiv.

[46]  Yao Zhang,et al.  Learning Overlapping Representations for the Estimation of Individualized Treatment Effects , 2020, AISTATS.

[47]  Christopher Weiss,et al.  Challenges With Propensity Score Strategies in a High-Dimensional Setting and a Potential Alternative , 2011, Multivariate behavioral research.

[48]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[49]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[50]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[51]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: A General Method for Estimating Sampling Variances for Standard Estimators for Average Causal Effects , 2015 .

[52]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[53]  Jonathan Tompson,et al.  Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[55]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[56]  Jennifer L. Hill,et al.  Assessing lack of common support in causal inference using bayesian nonparametrics: Implications for evaluating the effect of breastfeeding on children's cognitive outcomes , 2013, 1311.7244.

[57]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[58]  Dustin Tran,et al.  Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors , 2018, ArXiv.