Out-of-Distribution Generalization with Maximal Invariant Predictor

Out-of-Distribution (OOD) generalization problem is a problem of seeking the predictor function whose performance in the worst environments is optimal. This paper makes two contributions to OOD problem. We first use the basic results of probability to prove maximal Invariant Predictor(MIP) condition, a theoretical result that can be used to identify the OOD optimal solution. We then use our MIP to derive inner-environmental Gradient Alignment(IGA) algorithm that can be used to help seek the OOD optimal predictor. Previous studies that have investigated the theoretical aspect of the OOD-problem use strong structural assumptions such as causal DAG. However, in cases involving image datasets, for example, the identification of hidden structural relations is itself a difficult problem. Our theoretical results are different from those of many previous studies in that it can be applied to cases in which the underlying structure of a dataset is difficult to analyze. We present an extensive comparison of previous theoretical approaches to the OODproblems based on the assumptions they make. We also present an extension of the colored-MNIST that can more accurately represent the pathological OOD situation than the original version, and demonstrate the superiority of IGA over previous methods on both the original and the extended version of Colored-MNIST.

[1]  Bo Li,et al.  Causally Regularized Learning with Agnostic Data Selection Bias , 2017, ACM Multimedia.

[2]  Bernhard Schölkopf,et al.  Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[3]  J. Pearl Causal inference in statistics: An overview , 2009 .

[4]  D. Tao,et al.  Deep Domain Generalization via Conditional Invariant Adversarial Networks , 2018, ECCV.

[5]  Amos Storkey,et al.  When Training and Test Sets are Different: Characterising Learning Transfer , 2013 .

[6]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[7]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[8]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[9]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[10]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[11]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[12]  Anja De Waegenaere,et al.  Robust Solutions of Optimization Problems Affected by Uncertain Probabilities , 2011, Manag. Sci..

[13]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[14]  Gang Niu,et al.  Does Distributionally Robust Supervised Learning Give Robust Classifiers? , 2016, ICML.

[15]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[16]  Daniel Kuhn,et al.  Distributionally Robust Logistic Regression , 2015, NIPS.

[17]  Bernhard Schölkopf,et al.  Identifiability of Causal Graphs using Functional Models , 2011, UAI.

[18]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[19]  R. Durrett Probability: Theory and Examples , 1993 .

[20]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[21]  P. Bühlmann,et al.  Invariance, Causality and Robustness , 2018, Statistical Science.

[22]  Suchi Saria,et al.  Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport , 2018, AISTATS.

[23]  Kun Zhang,et al.  On Learning Invariant Representation for Domain Adaptation , 2019, ArXiv.

[24]  Neil D. Lawrence,et al.  When Training and Test Sets Are Different: Characterizing Learning Transfer , 2009 .

[25]  N. Meinshausen,et al.  Anchor regression: Heterogeneous data meet causality , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[26]  Alex ChiChung Kot,et al.  Domain Generalization with Adversarial Feature Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Joris M. Mooij,et al.  Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[29]  Dacheng Tao,et al.  Domain Generalization via Conditional Invariant Representations , 2018, AAAI.

[30]  Qi Tian,et al.  Domain-Invariant Adversarial Learning for Unsupervised Domain Adaption , 2018, ArXiv.

[31]  Xin Guo,et al.  On the optimality of conditional expectation as a Bregman predictor , 2005, IEEE Trans. Inf. Theory.

[32]  Enrique F. Castillo,et al.  Expert Systems and Probabilistic Network Models , 1996, Monographs in Computer Science.

[33]  Maxime Gasse,et al.  Probabilistic Graphical Model Structure Learning: Application to Multi-Label Classification. (Apprentissage de Structure de Modèles Graphiques Probabilistes: Application à la Classification Multi-Label) , 2017 .

[34]  Kristina Lerman,et al.  A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..

[35]  Amir Najafi,et al.  Robustness to Adversarial Perturbations in Learning from Incomplete Data , 2019, NeurIPS.

[36]  Tommi S. Jaakkola,et al.  Invariant Rationalization , 2020, ICML.

[37]  Peter J. F. Lucas,et al.  Advances in Probabilistic Graphical Models , 2008 .

[38]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[39]  Stefano Soatto,et al.  Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).

[40]  Gerald B. Folland,et al.  Real Analysis: Modern Techniques and Their Applications , 1984 .

[41]  Rajesh Ranganath,et al.  Support and Invertibility in Domain-Invariant Representations , 2019, AISTATS.