A causal framework for distribution generalization.

We consider the problem of predicting a response Y from a set of covariates X when test- and training distributions differ. Since such differences may have causal explanations, we consider test distributions that emerge from interventions in a structural causal model, and focus on minimizing the worst-case risk. Causal regression models, which regress the response on its direct causes, remain unchanged under arbitrary interventions on the covariates, but they are not always optimal in the above sense. For example, for linear models and bounded interventions, alternative solutions have been shown to be minimax prediction optimal. We introduce the formal framework of distribution generalization that allows us to analyze the above problem in partially observed nonlinear models for both direct interventions on X and interventions that occur indirectly via exogenous variables A. It takes into account that, in practice, minimax solutions need to be identified from data. Our framework allows us to characterize under which class of interventions the causal function is minimax optimal. We prove sufficient conditions for distribution generalization and present corresponding impossibility results. We propose a practical method, NILE, that achieves distribution generalization in a nonlinear IV setting with linear extrapolation. We prove consistency and present empirical results.

[1]  T. Haavelmo,et al.  The probability approach in econometrics , 1944 .

[2]  T. W. Anderson,et al.  Estimation of the Parameters of a Single Equation in a Complete System of Stochastic Equations , 1949 .

[3]  H. Theil,et al.  Economic Forecasts and Policy. , 1959 .

[4]  F. Fisher,et al.  The Identification Problem in Econometrics. , 1967 .

[5]  Harry H. Kelejian,et al.  Two-Stage Least Squares and Econometric Systems Linear in Parameters but Nonlinear in the Endogenous Variables , 1971 .

[6]  Dale W. Jorgenson,et al.  EFFICIENT ESTIMATION OF NONLINEAR SIMULTANEOUS EQUATIONS WITH ADDITIVE DISTURBANCES , 2022 .

[7]  Takeshi Amemiya,et al.  The nonlinear two-stage least-squares estimator , 1974 .

[8]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[9]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[10]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[11]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[12]  Jeffrey M. Woodbridge Econometric Analysis of Cross Section and Panel Data , 2002 .

[13]  J. Florens,et al.  Nonparametric Instrumental Regression , 2010 .

[14]  L. Ghaoui,et al.  Robust Classification with Interval Data , 2003 .

[15]  W. Newey,et al.  Instrumental variable estimation of nonparametric models , 2003 .

[16]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[17]  Alastair R. Hall,et al.  Generalized Method of Moments , 2005 .

[18]  Daniel Thalmann,et al.  Autonomy , 2005, SIGGRAPH Courses.

[19]  Stephen P. Boyd,et al.  Robust Fisher Discriminant Analysis , 2005, NIPS.

[20]  J. Andrew Bagnell,et al.  Robust Supervised Learning , 2005, AAAI.

[21]  K. Müller,et al.  Generalization Error Estimation under Covariate Shift , 2005 .

[22]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[23]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[24]  Roberto S. Mariano,et al.  Simultaneous Equation Model Estimators: Statistical Properties and Practical Implications , 2007 .

[25]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[26]  Patrik O. Hoyer,et al.  Estimation of causal effects using linear non-Gaussian causal models with hidden variables , 2008, Int. J. Approx. Reason..

[27]  Thomas P. Hayes,et al.  High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.

[28]  Bernhard Schölkopf,et al.  Identifying confounders using additive noise models , 2009, UAI.

[29]  Steffen Bickel,et al.  Discriminative Learning Under Covariate Shift , 2009, J. Mach. Learn. Res..

[30]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.

[31]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[32]  Tyler Lu,et al.  Impossibility Theorems for Domain Adaptation , 2010, AISTATS.

[33]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[34]  J. Horowitz Applied Nonparametric Instrumental Variables Estimation , 2011 .

[35]  Michael P. Murray,et al.  Instrumental Variables , 2011, International Encyclopedia of Statistical Science.

[36]  Zhaolin Hu,et al.  Kullback-Leibler divergence constrained distributionally robust optimization , 2012 .

[37]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[38]  Whitney K. Newey,et al.  Nonparametric Instrumental Variables Estimation , 2013 .

[39]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[40]  Ludwig Fahrmeir,et al.  Regression: Models, Methods and Applications , 2013 .

[41]  Bernhard Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[42]  Runze Li,et al.  A note on a nonparametric regression test through penalized splines. , 2014, Statistica Sinica.

[43]  Causal Transfer in Machine Learning , 2015 .

[44]  Daniel Kuhn,et al.  Distributionally Robust Logistic Regression , 2015, NIPS.

[45]  N. Meinshausen,et al.  Maximin effects in inhomogeneous large-scale data , 2014, 1406.0596.

[46]  Xiaohong Chen,et al.  Optimal Sup-Norm Rates and Uniform Inference on Nonlinear Functionals of Nonparametric IV Regression , 2015, 1508.03365.

[47]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[48]  Andreas Ritter,et al.  Structural Equations With Latent Variables , 2016 .

[49]  Gabriela Csurka,et al.  Domain Adaptation for Visual Applications: A Comprehensive Survey , 2017, ArXiv.

[50]  John C. Duchi,et al.  Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[51]  Dawn Song,et al.  Robust Physical-World Attacks on Deep Learning Models , 2017, 1707.08945.

[52]  Kevin Leyton-Brown,et al.  Deep IV: A Flexible Approach for Counterfactual Prediction , 2017, ICML.

[53]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[54]  Nicolai Meinshausen,et al.  CAUSALITY FROM A DISTRIBUTIONAL ROBUSTNESS POINT OF VIEW , 2018, 2018 IEEE Data Science Workshop (DSW).

[55]  Silvio Savarese,et al.  Adversarial Feature Augmentation for Unsupervised Domain Adaptation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[57]  Joris M. Mooij,et al.  Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[58]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[59]  Daniel Kuhn,et al.  Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations , 2015, Mathematical Programming.

[60]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[61]  Stefan Bauer,et al.  Learning stable and predictive structures in kinetic systems , 2018, Proceedings of the National Academy of Sciences.

[62]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[63]  Fan Zhang,et al.  Data-Driven Optimal Transport Cost Selection For Distributionally Robust Optimization , 2017, 2019 Winter Simulation Conference (WSC).

[64]  Arthur Gretton,et al.  Kernel Instrumental Variable Regression , 2019, NeurIPS.

[65]  Ilya Shpitser,et al.  Identification and Estimation of Causal Effects Defined by Shift Interventions , 2020, UAI.

[66]  J. Mooij,et al.  Foundations of structural causal models with cycles and latent variables , 2016, The Annals of Statistics.

[67]  Ruedi Aebersold,et al.  Stabilizing variable selection and regression , 2019, The Annals of Applied Statistics.

[68]  Jonas Peters,et al.  Distributional Robustness of K-class Estimators and the PULSE , 2020, The Econometrics Journal.

[69]  Christina Heinze-Deml,et al.  Conditional variance penalties and domain shift robustness , 2017, Machine Learning.

[70]  N. Meinshausen,et al.  Anchor regression: Heterogeneous data meet causality , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[71]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .