CAUSALITY FROM A DISTRIBUTIONAL ROBUSTNESS POINT OF VIEW

We show some connections between distributional robust and causal inference. Distributional robustness is here understood as optimizing the predictive accuracy for a whole class of distributions instead of just a single target distribution. The class of distributions considered is often a ball around the empirical distribution, using a Wasserstein or Kullback-Leibler distance. Causal inference can be seen as a special case of distributional robustness, where the class of distributions is generated by a causal model. Different choices for these classes are discussed, which correspond to different assumptions about the type of interventions and presence or absence of latent confounding variables.

[1]  Judea Pearl,et al.  Comment: Graphical Models, Causality and Intervention , 2016 .

[2]  Christina Heinze-Deml,et al.  Predicting the effect of interventions using invariance principles for nonlinear models , 2016 .

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  John C. Duchi,et al.  Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[5]  Xi Chen,et al.  Wasserstein Distributional Robustness and Regularization in Statistical Learning , 2017, ArXiv.

[6]  Nicolai Meinshausen,et al.  Causal Dantzig: Fast inference in linear structural equation models with hidden variables under additive interventions , 2017, The Annals of Statistics.

[7]  Shie Mannor,et al.  Robust Regression and Lasso , 2008, IEEE Transactions on Information Theory.

[8]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[9]  T. Richardson Single World Intervention Graphs ( SWIGs ) : A Unification of the Counterfactual and Graphical Approaches to Causality , 2013 .

[10]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[11]  John C. Duchi,et al.  Variance-based Regularization with Convex Objectives , 2016, NIPS.

[12]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[13]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[14]  Christina Heinze-Deml,et al.  Conditional variance penalties and domain shift robustness , 2017, Machine Learning.

[15]  N. Meinshausen,et al.  Anchor regression: Heterogeneous data meet causality , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[16]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[17]  D. Katz The American Statistical Association , 2000 .

[18]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[19]  P. J. Huber Robust Regression: Asymptotics, Conjectures and Monte Carlo , 1973 .

[20]  J. Pearl Graphical Models, Causality, and Intervention , 2011 .

[21]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.