Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization

The invariance principle from causality is at the heart of notable approaches such as invariant risk minimization (IRM) that seek to address out-of-distribution (OOD) generalization failures. Despite the promising theory, invariance principle-based approaches fail in common classification tasks, where invariant (causal) features capture all the information about the label. Are these failures due to the methods failing to capture the invariance? Or is the invariance principle itself insufficient? To answer these questions, we revisit the fundamental assumptions in linear regression tasks, where invariance-based approaches were shown to provably generalize OOD. In contrast to the linear regression tasks, we show that for linear classification tasks we need much stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible. Furthermore, even with appropriate restrictions on distribution shifts in place, we show that the invariance principle alone is insufficient. We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not. We propose an approach that incorporates both of these principles and demonstrate its effectiveness in several experiments.

[1]  Kun Zhang,et al.  On Learning Invariant Representation for Domain Adaptation , 2019, ArXiv.

[2]  Behnam Neyshabur,et al.  Understanding the Failure Modes of Out-of-Distribution Generalization , 2021, ICLR.

[3]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[4]  Anton van den Hengel,et al.  Unshuffling Data for Improved Generalization in Visual Question Answering , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[6]  Prasad Patil,et al.  Representation via Representations: Domain Generalization via Adversarially Learned Invariant Representations , 2020, ArXiv.

[7]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[8]  José Miguel Hernández-Lobato,et al.  Nonlinear Invariant Risk Minimization: A Causal Approach , 2021, ArXiv.

[9]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.

[10]  Shruti Tople,et al.  Domain Generalization using Causal Matching , 2020, ICML.

[11]  Amit Dhurandhar,et al.  Invariant Risk Minimization Games , 2020, ICML.

[12]  Nathan Srebro,et al.  Does Invariant Risk Minimization Capture Invariance? , 2021, ArXiv.

[13]  David J. Schwab,et al.  The Deterministic Information Bottleneck , 2015, Neural Computation.

[14]  Tatsuya Harada,et al.  Domain Generalization Using a Mixture of Multiple Latent Domains , 2019, AAAI.

[15]  Zhiwei Steven Wu,et al.  Learn to Expect the Unexpected: Probably Approximately Correct Domain Generalization , 2020, AISTATS.

[16]  George J. Pappas,et al.  Model-Based Domain Generalization , 2021, NeurIPS.

[17]  Shai Ben-David,et al.  On the Hardness of Domain Adaptation and the Utility of Unlabeled Target Samples , 2012, ALT.

[18]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[19]  Aapo Hyvärinen,et al.  Variational Autoencoders and Nonlinear ICA: A Unifying Framework , 2019, AISTATS.

[20]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[21]  Richard Socher,et al.  Entropy Penalty: Towards Generalization Beyond the IID Assumption , 2019, ArXiv.

[22]  Pradeep Ravikumar,et al.  An Online Learning Approach to Interpolation and Extrapolation in Domain Generalization , 2021, AISTATS.

[23]  Masanori Koyama,et al.  Out-of-Distribution Generalization with Maximal Invariant Predictor , 2020, ArXiv.

[24]  Tengyu Ma,et al.  In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness , 2020, ICLR.

[25]  Bernhard Schölkopf,et al.  Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[26]  David Lopez-Paz,et al.  Linear unit-tests for invariance discovery , 2021, ArXiv.

[27]  M. Bethge,et al.  Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[28]  J. Pearl Causal diagrams for empirical research , 1995 .

[29]  Luigi Gresele,et al.  Learning explanations that are hard to vary , 2020, ArXiv.

[30]  Aaron C. Courville,et al.  Systematic generalisation with group invariant predictions , 2021, ICLR.

[31]  Tatsunori B. Hashimoto,et al.  Distributionally Robust Neural Networks , 2020, ICLR.

[32]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[33]  Amit Dhurandhar,et al.  Empirical or Invariant Risk Minimization? A Sample Complexity Perspective , 2020, ArXiv.

[34]  Tyler Lu,et al.  Impossibility Theorems for Domain Adaptation , 2010, AISTATS.

[35]  Tommi S. Jaakkola,et al.  Invariant Rationalization , 2020, ICML.

[36]  Ullrich Köthe,et al.  Learning Robust Models Using The Principle of Independent Causal Mechanisms , 2020, ArXiv.

[37]  Yarin Gal,et al.  Unpacking Information Bottlenecks: Unifying Information-Theoretic Objectives in Deep Learning , 2020, ArXiv.

[38]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[39]  Christina Heinze-Deml,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.

[40]  D. Tao,et al.  Deep Domain Generalization via Conditional Invariant Adversarial Networks , 2018, ECCV.

[41]  Ioannis Mitliagkas,et al.  Adversarial target-invariant representation learning for domain generalization , 2019, ArXiv.

[42]  Tongliang Liu,et al.  Domain Generalization via Entropy Regularization , 2020, NeurIPS.

[43]  Joseph D. Janizek,et al.  AI for radiographic COVID-19 detection selects shortcuts over signal , 2020, Nature Machine Intelligence.

[44]  Regina Barzilay,et al.  Enforcing Predictive Invariance across Structured Biomedical Domains , 2020 .

[45]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[46]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[47]  R. Ash,et al.  Probability and measure theory , 1999 .

[48]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[49]  Nathan Srebro,et al.  The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..

[50]  Pradeep Ravikumar,et al.  The Risks of Invariant Risk Minimization , 2020, ICLR.

[51]  Artidoro Pagnoni,et al.  PAC Learning Guarantees Under Covariate Shift , 2018, ArXiv.

[52]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[53]  Sunita Sarawagi,et al.  Efficient Domain Generalization via Common-Specific Low-Rank Decomposition , 2020, ICML.