Improving Out-of-Distribution Robustness via Selective Augmentation

Machine learning algorithms typically assume that training and test examples are drawn from the same distribution. However, distribution shift is a common problem in real-world applications and can cause models to perform dramatically worse at test time. In this paper, we specifically consider the problems of domain shifts and subpopulation shifts (eg. imbalanced data). While prior works often seek to explicitly regularize internal representations and predictors of the model to be domain invariant, we instead aim to regularize the whole function without restricting the model’s internal representations. This leads to a simple mixup-based technique which learns invariant functions via selective augmentation called LISA. LISA selectively interpolates samples either with the same labels but different domains or with the same domain but different labels. We analyze a linear setting and theoretically show how LISA leads to a smaller worst-group error. Empirically, we study the effectiveness of LISA on nine benchmarks ranging from subpopulation shifts to domain shifts, and we find that LISA consistently outperforms other state-of-the-art methods.

[1]  Tao Xiang,et al.  Deep Domain-Adversarial Image Generation for Domain Generalisation , 2020, AAAI.

[2]  Andrea Montanari,et al.  The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime , 2019 .

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Xi Peng,et al.  Learning to Learn Single Domain Generalization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[6]  Lincan Zou,et al.  Improve Unsupervised Domain Adaptation with Mixup Training , 2020, ArXiv.

[7]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[8]  Eunho Yang,et al.  Meta Dropout: Learning to Perturb Latent Features for Generalization , 2020, ICLR.

[9]  Paul Michel,et al.  Examining and Combating Spurious Features under Distribution Shift , 2021, ICML.

[10]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[11]  Suvrit Sra,et al.  Coping with Label Shift via Distributionally Robust Optimisation , 2020, ICLR.

[12]  Percy Liang,et al.  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[13]  Gordon Christie,et al.  Functional Map of the World , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Brahim Chaib-draa,et al.  Domain Generalization with Optimal Transport and Metric Learning , 2020, ArXiv.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Percy Liang,et al.  An Investigation of Why Overparameterization Exacerbates Spurious Correlations , 2020, ICML.

[18]  Silvio Savarese,et al.  Generalizing to Unseen Domains via Adversarial Data Augmentation , 2018, NeurIPS.

[19]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[20]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[21]  Pradeep Ravikumar,et al.  The Risks of Invariant Risk Minimization , 2020, ICLR.

[22]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[23]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[24]  Ioannis Mitliagkas,et al.  Generalizing to unseen domains via distribution matching , 2019 .

[25]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[26]  Masanori Koyama,et al.  Out-of-Distribution Generalization with Maximal Invariant Predictor , 2020, ArXiv.

[27]  Jianmo Ni,et al.  Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects , 2019, EMNLP.

[28]  Junwei Lu,et al.  Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization , 2020, ArXiv.

[29]  Alex ChiChung Kot,et al.  Domain Generalization with Adversarial Feature Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Kate Saenko,et al.  Deep CORAL: Correlation Alignment for Deep Domain Adaptation , 2016, ECCV Workshops.

[31]  Tommi S. Jaakkola,et al.  Invariant Rationalization , 2020, ICML.

[32]  Jure Leskovec,et al.  WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2021, ICML.

[33]  James Zou,et al.  When and How Mixup Improves Calibration , 2021, ArXiv.

[34]  Bingbing Ni,et al.  Adversarial Domain Adaptation with Domain Mixup , 2019, AAAI.

[35]  Zhun Deng,et al.  How Does Mixup Help With Robustness and Generalization? , 2020, ArXiv.

[36]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[37]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[39]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[40]  Yufei Wang,et al.  Heterogeneous Domain Generalization Via Domain Mixup , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Jason Yosinski,et al.  R X R X 1: A N IMAGE SET FOR CELLULAR MORPHOLOGICAL VARIATION ACROSS MANY EXPERIMENTAL BATCHES , 2019 .

[42]  Philip H.S. Torr,et al.  Gradient Matching for Domain Generalization , 2021, ArXiv.

[43]  Karan Goel,et al.  Model Patching: Closing the Subgroup Performance Gap with Data Augmentation , 2020, ICLR.

[44]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Lucy Vasserman,et al.  Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification , 2019, WWW.

[46]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[47]  Zhangjie Cao,et al.  Open Domain Generalization with Domain-Augmented Meta-Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Dimitris N. Metaxas,et al.  Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness , 2020, NeurIPS.