Assaying Out-Of-Distribution Generalization in Transfer Learning

Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e.g., calibration, adversarial robustness, algorithmic corruptions, invariance across shifts) were studied across different research programs resulting in different recommendations. While sharing the same aspirational goal, these approaches have never been tested under the same experimental conditions on real data. In this paper, we take a unified view of previous work, highlighting message discrepancies that we address empirically, and providing recommendations on how to measure the robustness of a model and how to improve it. To this end, we collect 172 publicly available dataset pairs for training and out-of-distribution evaluation of accuracy, calibration error, adversarial attacks, environment invariance, and synthetic corruptions. We fine-tune over 31k networks, from nine different architectures in the many- and few-shot setting. Our findings confirm that in- and out-of-distribution accuracies tend to increase jointly, but show that their relation is largely dataset-dependent, and in general more nuanced and more complex than posited by previous, smaller scale studies. and out-of-distribution (OOD) dataset pairs, fine-tuning and evaluating over 31k models to gain a broader insight in the sometimes contradicting statements on OOD robustness in previous research. We organize our study around two key questions: (1) What are good proxy measures of OOD robustness when having access to a single dataset? (2) How do architecture choices and fine-tuning strategies affect robustness? We plan to publish the code with the camera-ready version of the paper.

[1]  Sergio Gomez Colmenarejo,et al.  A Generalist Agent , 2022, ArXiv.

[2]  Jasper Snoek,et al.  A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness , 2022, 2205.00403.

[3]  Jared A. Dunnmon,et al.  Domino: Discovering Systematic Errors with Cross-Modal Embeddings , 2022, ICLR.

[4]  B. Schölkopf,et al.  Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  James Y. Zou,et al.  MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts , 2022, ICLR.

[6]  Julius von Kügelgen,et al.  Visual Representation Learning Does Not Generalize Strongly Within the Same Domain , 2021, ICLR.

[7]  Michele De Vita,et al.  Generalization and Robustness Implications in Object-Centric Learning , 2021, ICML.

[8]  Pin-Yu Chen,et al.  Vision Transformers are Robust Learners , 2021, AAAI.

[9]  Richard E. Turner,et al.  Bayesian Neural Network Priors Revisited , 2021, ICLR.

[10]  Alan Yuille,et al.  Are Transformers More Robust Than CNNs? , 2021, NeurIPS.

[11]  Ali Taylan Cemgil,et al.  A Fine-Grained Analysis on Distribution Shift , 2021, ICLR.

[12]  Yoshua Bengio,et al.  Dynamic Inference with Neural Interpreters , 2021, NeurIPS.

[13]  F. Wenzel,et al.  Deep Classifiers with Label Noise Modeling and Distance Awareness , 2021, Trans. Mach. Learn. Res..

[14]  Yair Carmon,et al.  Accuracy on the Line: on the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization , 2021, ICML.

[15]  Behnam Neyshabur,et al.  The Evolution of Out-of-Distribution Robustness Throughout Fine-Tuning , 2021, Trans. Mach. Learn. Res..

[16]  Xiaohua Zhai,et al.  Revisiting the Calibration of Modern Neural Networks , 2021, NeurIPS.

[17]  Alexander Kolesnikov,et al.  Scaling Vision Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Cho-Jui Hsieh,et al.  When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations , 2021, ICLR.

[19]  Qun Liu,et al.  Improved OOD Generalization via Adversarial Training and Pre-training , 2021, ICML.

[20]  Fahad Shahbaz Khan,et al.  Intriguing Properties of Vision Transformers , 2021, NeurIPS.

[21]  Hui Xue,et al.  Towards Robust Vision Transformer , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Quoc V. Le,et al.  Pay Attention to MLPs , 2021, NeurIPS.

[23]  Matthieu Cord,et al.  ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  A. Dosovitskiy,et al.  MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.

[25]  Quoc V. Le,et al.  EfficientNetV2: Smaller Models and Faster Training , 2021, ICML.

[26]  Cordelia Schmid,et al.  Improving robustness against common corruptions with frequency biased models , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Andreas Veit,et al.  Understanding Robustness of Transformers for Image Classification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[29]  Alec Radford,et al.  Zero-Shot Text-to-Image Generation , 2021, ICML.

[30]  Yoshua Bengio,et al.  Towards Causal Representation Learning , 2021, ArXiv.

[31]  Uri Shalit,et al.  On Calibration and Out-of-domain Generalization , 2021, NeurIPS.

[32]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[33]  Pang Wei Koh,et al.  WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2020, ICML.

[34]  Li Liu,et al.  A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges , 2020, Inf. Fusion.

[35]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[36]  Nicolas Flammarion,et al.  RobustBench: a standardized adversarial robustness benchmark , 2020, NeurIPS Datasets and Benchmarks.

[37]  R. Zemel,et al.  Environment Inference for Invariant Learning , 2020, ICML.

[38]  B. Schölkopf,et al.  Learning explanations that are hard to vary , 2020, ICLR.

[39]  Alexander D'Amour,et al.  On Robustness and Transferability of Convolutional Neural Networks , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.

[41]  D. Song,et al.  The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[43]  Sergey Levine,et al.  Recurrent Independent Mechanisms , 2019, ICLR.

[44]  Dawn Song,et al.  Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  B. Recht,et al.  Do Image Classifiers Generalize Across Time? , 2019, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Nan Rosemary Ke,et al.  Neural Production Systems , 2021, Neural Information Processing Systems.

[48]  Alexander D'Amour,et al.  Underspecification Presents Challenges for Credibility in Modern Machine Learning , 2020, J. Mach. Learn. Res..

[49]  Xian-Sheng Hua,et al.  Interventional Few-Shot Learning , 2020, NeurIPS.

[50]  Matthias Bethge,et al.  Foolbox Native: Fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX , 2020, J. Open Source Softw..

[51]  Benjamin Recht,et al.  Evaluating Machine Accuracy on ImageNet , 2020, ICML.

[52]  Benjamin Recht,et al.  Measuring Robustness to Natural Distribution Shifts in Image Classification , 2020, NeurIPS.

[53]  Thomas Kipf,et al.  Object-Centric Learning with Slot Attention , 2020, NeurIPS.

[54]  Jasper Snoek,et al.  Hyperparameter Ensembles for Robustness and Uncertainty Quantification , 2020, NeurIPS.

[55]  Tatsunori B. Hashimoto,et al.  Distributionally Robust Neural Networks , 2020, ICLR.

[56]  Lauren Wilcox,et al.  A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy , 2020, CHI.

[57]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[58]  Dustin Tran,et al.  BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning , 2020, ICLR.

[59]  Ben Poole,et al.  Weakly-Supervised Disentanglement Without Compromises , 2020, ICML.

[60]  Bastiaan S. Veeling,et al.  How Good is the Bayes Posterior in Deep Neural Networks Really? , 2020, ICML.

[61]  Chen Chen,et al.  An Analysis of Adversarial Attacks and Defenses on Autonomous Driving Models , 2020, 2020 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[62]  Matthias Bethge,et al.  A Simple Way to Make Neural Networks Robust Against Diverse Image Corruptions , 2020, ECCV.

[63]  Hans-Peter Beise,et al.  SVIRO: Synthetic Vehicle Interior Rear Seat Occupancy Dataset and Benchmark , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[64]  S. Gelly,et al.  Big Transfer (BiT): General Visual Representation Learning , 2019, ECCV.

[65]  J. Gilmer,et al.  AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty , 2019, ICLR.

[66]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[68]  Andrew Gordon Wilson,et al.  Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning , 2019, ICLR.

[69]  Luca Oneto,et al.  Fairness in Machine Learning , 2020, INNSBDDL.

[70]  André Susano Pinto,et al.  A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark , 2019, 1910.04867.

[71]  Michael J. Black,et al.  Attacking Optical Flow , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[72]  Alexander S. Ecker,et al.  Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming , 2019, ArXiv.

[73]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[74]  Stefan Bauer,et al.  On the Fairness of Disentangled Representations , 2019, NeurIPS.

[75]  Benjamin Recht,et al.  A systematic framework for natural perturbations from videos , 2019, ArXiv.

[76]  Eric P. Xing,et al.  Learning Robust Global Representations by Penalizing Local Predictive Power , 2019, NeurIPS.

[77]  Jeremy Nixon,et al.  Measuring Calibration in Deep Learning , 2019, CVPR Workshops.

[78]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[79]  Andrew Gordon Wilson,et al.  A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[80]  Bernt Schiele,et al.  Not Using the Car to See the Sidewalk — Quantifying and Controlling the Effects of Context in Classification and Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Bo Wang,et al.  Moment Matching for Multi-Source Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[82]  Zhi Zhang,et al.  Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[84]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[85]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[86]  Yair Weiss,et al.  Why do deep convolutional networks generalize so poorly to small image transformations? , 2018, J. Mach. Learn. Res..

[87]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[88]  Boris Katz,et al.  ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.

[89]  Saumik Bhattacharya,et al.  Effects of Degradations on Deep Neural Network Architectures , 2018, ArXiv.

[90]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[91]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[92]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[93]  Bernhard Schölkopf,et al.  Learning Independent Causal Mechanisms , 2017, ICML.

[94]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[95]  Yongxin Yang,et al.  Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[96]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[97]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[98]  W. Brendel,et al.  Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .

[99]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[100]  Hazim Kemal Ekenel,et al.  How Image Degradations Affect Deep CNN-Based Face Recognition? , 2016, 2016 International Conference of the Biometrics Special Interest Group (BIOSIG).

[101]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[102]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[103]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[104]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[105]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[106]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[107]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[108]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[109]  Antonio Torralba,et al.  Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[110]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[111]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.