Out-of-Distribution Generalization via Risk Extrapolation (REx)

Generalizing outside of the training distribution is an open challenge for current machine learning systems. A weak form of out-of-distribution (OoD) generalization is the ability to successfully interpolate between multiple observed distributions. One way to achieve this is through robust optimization, which seeks to minimize the worst-case risk over convex combinations of the training distributions. However, a much stronger form of OoD generalization is the ability of models to extrapolate beyond the distributions observed during training. In pursuit of strong OoD generalization, we introduce the principle of Risk Extrapolation (REx). REx can be viewed as encouraging robustness over affine combinations of training risks, by encouraging strict equality between training risks. We show conceptually how this principle enables extrapolation, and demonstrate the effectiveness and scalability of instantiations of REx on various OoD generalization tasks. Our code can be found at this https URL.

[1]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[2]  Ioannis Mitliagkas,et al.  Generalizing to unseen domains via distribution matching , 2019 .

[3]  Christoph H. Lampert,et al.  Learning Equations for Extrapolation and Control , 2018, ICML.

[4]  D. Tao,et al.  Deep Domain Generalization via Conditional Invariant Adversarial Networks , 2018, ECCV.

[5]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[6]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[7]  Ioannis Mitliagkas,et al.  Adversarial target-invariant representation learning for domain generalization , 2019, ArXiv.

[8]  Aleksander Madry,et al.  Learning Perceptually-Aligned Representations via Adversarial Robustness , 2019, ArXiv.

[9]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[10]  Yongxin Yang,et al.  Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Christina Heinze-Deml,et al.  Conditional variance penalties and domain shift robustness , 2017, Machine Learning.

[12]  Michael I. Jordan,et al.  Conditional Adversarial Domain Adaptation , 2017, NeurIPS.

[13]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[14]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[15]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[17]  J. Andrew Bagnell,et al.  Robust Supervised Learning , 2005, AAAI.

[18]  N. Meinshausen,et al.  Maximin effects in inhomogeneous large-scale data , 2014, 1406.0596.

[19]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[20]  Razvan Pascanu,et al.  Natural Neural Networks , 2015, NIPS.

[21]  N. Meinshausen,et al.  Anchor regression: Heterogeneous data meet causality , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[22]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[23]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[24]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[25]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[26]  Ekin D. Cubuk,et al.  A Fourier Perspective on Model Robustness in Computer Vision , 2019, NeurIPS.

[27]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[28]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[29]  Jean-Philippe Vial,et al.  Robust Optimization , 2021, ICORES.

[30]  Po-Sen Huang,et al.  Achieving Robustness in the Wild via Adversarial Mixing With Disentangled Representations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[32]  Aleksander Madry,et al.  Adversarial Robustness as a Prior for Learned Representations , 2019 .

[33]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[34]  Fabio Maria Carlucci,et al.  Domain Generalization by Solving Jigsaw Puzzles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[36]  Eric P. Xing,et al.  Learning Robust Representations by Projecting Superficial Statistics Out , 2018, ICLR.

[37]  Patrick Haffner,et al.  Escaping the Convex Hull with Extrapolated Vector Machines , 2001, NIPS.

[38]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[39]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[40]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[41]  Matthias Bethge,et al.  Excessive Invariance Causes Adversarial Vulnerability , 2018, ICLR.

[42]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.