Causal Balancing for Domain Generalization

While machine learning models rapidly advance the state-of-the-art on various real-world tasks, out-of-domain (OOD) generalization remains a challenging problem given the vulnerability of these models to spurious correlations. We propose a causally-motivated balanced mini-batch sampling strategy to train robust classifiers that is minimax optimal across a diverse enough environment space, by utilizing multiple training sets from different environments. We provide an identifiability guarantee of the latent covariates in the proposed causal graph and show that our proposed approach samples train data from a balanced, spurious-free distribution under an ideal scenario. Experiments are conducted on three domain generalization datasets, demonstrating empirically that our balanced mini-batch sampling strategy improves the performance of four different established domain generalization model baselines compared to the random mini-batch sampling strategy.

[1]  A. Krause,et al.  Invariant Causal Mechanisms through Distribution Matching , 2022, ArXiv.

[2]  Praneeth Netrapalli,et al.  Focus on the Common Good: Group Distributional Robustness Follows , 2021, ICLR.

[3]  M. Cord,et al.  Fishr: Invariant Gradient Variances for Out-of-distribution Generalization , 2021, ICML.

[4]  Lily H. Zhang,et al.  Out-of-distribution Generalization in the Presence of Nuisance-Induced Spurious Correlations , 2021, ICLR.

[5]  Davis W. Blalock,et al.  Causally motivated shortcut removal using auxiliary labels , 2021, AISTATS.

[6]  Philip H. S. Torr,et al.  Gradient Matching for Domain Generalization , 2021, ICLR.

[7]  Yuhuai Wu,et al.  Invariant Causal Representation Learning for Out-of-Distribution Generalization , 2022, ICLR.

[8]  Chelsea Finn,et al.  Just Train Twice: Improving Group Robustness without Training Group Information , 2021, ICML.

[9]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[10]  Paul Michel,et al.  Examining and Combating Spurious Features under Distribution Shift , 2021, ICML.

[11]  Kartik Ahuja,et al.  SAND-mask: An Enhanced Gradient Masking Strategy for the Discovery of Invariances in Domain Generalization , 2021, ArXiv.

[12]  Donggeun Yoo,et al.  Reducing Domain Gap by Reducing Style Bias , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Regina Barzilay,et al.  Predict then Interpolate: A Simple Algorithm to Learn Stable Classifiers , 2021, ICML.

[14]  Seunghyun Park,et al.  SelfReg: Self-supervised Contrastive Regularization for Domain Generalization , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Uri Shalit,et al.  On Calibration and Out-of-domain Generalization , 2021, NeurIPS.

[16]  Ruocheng Guo,et al.  Out-of-distribution Prediction with Invariant Risk Minimization: The Limitation and An Effective Fix , 2021, ArXiv.

[17]  Danica J. Sutherland,et al.  Does Invariant Risk Minimization Capture Invariance? , 2021, AISTATS.

[18]  Yonatan Belinkov,et al.  Learning from others' mistakes: Avoiding dataset biases without modeling them , 2020, ICLR.

[19]  Tie-Yan Liu,et al.  Learning Causal Semantic Representation for Out-of-Distribution Prediction , 2020, NeurIPS.

[20]  Pradeep Ravikumar,et al.  The Risks of Invariant Risk Minimization , 2020, ICLR.

[21]  B. Schölkopf,et al.  Learning explanations that are hard to vary , 2020, ICLR.

[22]  S. Levine,et al.  Adaptive Risk Minimization: Learning to Adapt to Domain Shift , 2020, NeurIPS.

[23]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.

[24]  Amit Sharma,et al.  Domain Generalization using Causal Matching , 2020, ICML.

[25]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[26]  Gilles Blanchard,et al.  Domain Generalization by Marginal Transfer Learning , 2017, J. Mach. Learn. Res..

[27]  M. Schaar,et al.  Accounting for Unobserved Confounding in Domain Generalization , 2020, 2007.10653.

[28]  Eric P. Xing,et al.  Self-Challenging Improves Cross-Domain Generalization , 2020, ECCV.

[29]  Hao Tan,et al.  Diagnosing the Environment Bias in Vision-and-Language Navigation , 2020, IJCAI.

[30]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[31]  Bingbing Ni,et al.  Adversarial Domain Adaptation with Domain Mixup , 2019, AAAI.

[32]  Aapo Hyvärinen,et al.  Variational Autoencoders and Nonlinear ICA: A Unifying Framework , 2019, AISTATS.

[33]  Tatsunori B. Hashimoto,et al.  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[34]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[35]  Bo Wang,et al.  Moment Matching for Multi-Source Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[37]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[38]  Walter Karlen,et al.  Perfect Match: A Simple Method for Learning Representations For Counterfactual Inference With Neural Networks , 2018, ArXiv.

[39]  D. Tao,et al.  Deep Domain Generalization via Conditional Invariant Adversarial Networks , 2018, ECCV.

[40]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[41]  Alex ChiChung Kot,et al.  Domain Generalization with Adversarial Feature Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Yongxin Yang,et al.  Learning to Generalize: Meta-Learning for Domain Generalization , 2017, AAAI.

[43]  Yoshua Bengio,et al.  Measuring the tendency of CNNs to Learn Surface Statistical Regularities , 2017, ArXiv.

[44]  Yongxin Yang,et al.  Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Donald A. Adjeroh,et al.  Unified Deep Supervised Domain Adaptation and Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Sethuraman Panchanathan,et al.  Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Michael J. Lopez,et al.  Estimation of causal effects with multiple treatments: a review and new ideas , 2017, 1701.05132.

[48]  Aapo Hyvärinen,et al.  Density Estimation in Infinite Dimensional Exponential Families , 2013, J. Mach. Learn. Res..

[49]  Kate Saenko,et al.  Deep CORAL: Correlation Alignment for Deep Domain Adaptation , 2016, ECCV Workshops.

[50]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[51]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[52]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  MarchandMario,et al.  Domain-adversarial training of neural networks , 2016 .

[54]  Mengjie Zhang,et al.  Domain Generalization for Object Recognition with Multi-task Autoencoders , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[55]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[56]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[57]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[58]  Ye Xu,et al.  Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias , 2013, 2013 IEEE International Conference on Computer Vision.

[59]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[60]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[61]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[62]  P. Holland Statistics and Causal Inference , 1985 .

[63]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[64]  A. Dawid Conditional Independence in Statistical Theory , 1979 .