Counterfactual Normalization: Proactively Addressing Dataset Shift Using Causal Mechanisms

Predictive models can fail to generalize from training to deployment environments because of dataset shift, posing a threat to model reliability in practice. As opposed to previous methods which use samples from the target distribution to reactively correct dataset shift, we propose using graphical knowledge of the causal mechanisms relating variables in a prediction problem to proactively remove variables that participate in spurious associations with the prediction target, allowing models to generalize across datasets. To accomplish this, we augment the causal graph with latent counterfactual variables that account for the underlying causal mechanisms, and show how we can estimate these variables. In our experiments we demonstrate that models using good estimates of the latent variables instead of the observed variables transfer better from training to target domains with minimal accuracy loss in the training domain.

[1]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[2]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[3]  Edwin K. P. Chong,et al.  An Introduction to Optimization Ed. 4 , 2013 .

[4]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[5]  D. Rubin [On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.] Comment: Neyman (1923) and Causal Inference in Experiments and Observational Studies , 1990 .

[6]  E. Ely,et al.  Hematologic changes in sepsis and their therapeutic implications. , 2004, Seminars in respiratory and critical care medicine.

[7]  A. O'Hagan,et al.  Bayesian calibration of computer models , 2001 .

[8]  J. Heckman Sample Selection Bias as a Specification Error (with an Application to the Estimation of Labor Supply Functions) , 1977 .

[9]  Jin Tian,et al.  Recovering Causal Effects from Selection Bias , 2015, AAAI.

[10]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Suchi Saria,et al.  Reliable Decision Support using Counterfactual Models , 2017, NIPS.

[12]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[13]  Ahmed M. Alaa,et al.  Bayesian Nonparametric Causal Inference: Information Rates and Learning Algorithms , 2017, IEEE Journal of Selected Topics in Signal Processing.

[14]  Suchi Saria,et al.  Learning (predictive) risk scores in the presence of censoring due to interventions , 2015, Machine Learning.

[15]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[16]  T. Richardson Single World Intervention Graphs ( SWIGs ) : A Unification of the Counterfactual and Graphical Approaches to Causality , 2013 .

[17]  Thorsten Joachims,et al.  Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.

[18]  Judea Pearl,et al.  What Counterfactuals Can Be Tested , 2007, UAI.

[19]  S. Obaro,et al.  Infection in sickle cell disease: a review. , 2010, International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases.

[20]  John Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..

[21]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[22]  P. Pronovost,et al.  A targeted real-time early warning score (TREWScore) for septic shock , 2015, Science Translational Medicine.

[23]  Elias Bareinboim,et al.  Transportability of Causal and Statistical Relations: A Formal Approach , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[24]  Judea Pearl,et al.  Causes of Effects and Effects of Causes , 2015 .

[25]  Petra Kaufmann,et al.  Experimental And Quasi Experimental Designs For Research , 2016 .

[26]  Elias Bareinboim,et al.  Meta-Transportability of Causal Effects: A Formal Approach , 2013, AISTATS.

[27]  Suchi Saria,et al.  Scalable Joint Models for Reliable Uncertainty-Aware Event Prediction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Katherine A. Heller,et al.  Learning to Detect Sepsis with a Multitask Gaussian Process RNN Classifier , 2017, ICML.

[29]  Tin Kam Ho,et al.  Measuring the complexity of classification problems , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[30]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[31]  Thomas S. Richardson,et al.  Causal Inference in the Presence of Latent Variables and Selection Bias , 1995, UAI.

[32]  Neil D. Lawrence,et al.  When Training and Test Sets Are Different: Characterizing Learning Transfer , 2009 .

[33]  Jin Tian,et al.  Generalized Adjustment Under Confounding and Selection Biases , 2018, AAAI.

[34]  Elias Bareinboim,et al.  Controlling Selection Bias in Causal Inference , 2011, AISTATS.

[35]  Elias Bareinboim,et al.  Causal Effect Identification by Adjustment under Confounding and Selection Biases , 2017, AAAI.

[36]  K. Wood,et al.  Duration of hypotension before initiation of effective antimicrobial therapy is the critical determinant of survival in human septic shock* , 2006, Critical care medicine.

[37]  Judea Pearl,et al.  Complete Identification Methods for the Causal Hierarchy , 2008, J. Mach. Learn. Res..

[38]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .