The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

We introduce four new real-world distribution shift datasets consisting of changes in image style, image blurriness, geographic location, camera operation, and more. With our new datasets, we take stock of previously proposed methods for improving out-of-distribution robustness and put them to the test. We find that using larger models and artificial data augmentations can improve robustness on real-world distribution shifts, contrary to claims in prior work. We find improvements in artificial robustness benchmarks can transfer to real-world distribution shifts, contrary to claims in prior work. Motivated by our observation that data augmentations can help with real-world distribution shifts, we also introduce a new data augmentation method which advances the state-of-the-art and outperforms models pre-trained with 1000× more labeled data. Overall we find that some methods consistently help with distribution shifts in texture and local image statistics, but these methods do not help with some other distribution shifts like geographic changes. Our results show that future research must study multiple distribution shifts simultaneously, as we demonstrate that no evaluated method consistently improves robustness.

[1]  I. Biederman,et al.  Surface versus edge-based determinants of visual recognition , 1988, Cognitive Psychology.

[2]  S. Itakura,et al.  Recognition of line-drawing representations by a chimpanzee (Pan troglodytes). , 1994, The Journal of general psychology.

[3]  Masayuki Tanaka,et al.  Recognition of pictorial representations by chimpanzees (Pan troglodytes) , 2007, Animal Cognition.

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Christian Früh,et al.  Google Street View: Capturing the World at Street Level , 2010, Computer.

[6]  Jia Deng,et al.  Large scale visual recognition , 2012 .

[7]  Alexander Mordvintsev,et al.  Inceptionism: Going Deeper into Neural Networks , 2015 .

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Kyoung Mu Lee,et al.  Enhanced Deep Residual Networks for Single Image Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  Lina J. Karam,et al.  A Study and Comparison of Human and Deep Learning Recognition Performance under Visual Distortions , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[12]  Lucas Theis,et al.  Lossy Image Compression with Compressive Autoencoders , 2017, ICLR.

[13]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[15]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[16]  Matthias Bethge,et al.  Generalisation in humans and deep neural networks , 2018, NeurIPS.

[17]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[18]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[20]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[21]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[22]  Benjamin Recht,et al.  When Robustness Doesn’t Promote Robustness: Synthetic vs. Natural Distribution Shifts on ImageNet , 2019 .

[23]  S. Gelly,et al.  Large Scale Learning of General Visual Representations for Transfer , 2019, ArXiv.

[24]  Ruimao Zhang,et al.  DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Eric P. Xing,et al.  Learning Robust Global Representations by Penalizing Local Predictive Power , 2019, NeurIPS.

[26]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[27]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[28]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Ekin D. Cubuk,et al.  A Fourier Perspective on Model Robustness in Computer Vision , 2019, NeurIPS.

[30]  Kimin Lee,et al.  Using Pre-Training Can Improve Model Robustness and Uncertainty , 2019, ICML.

[31]  Ekin D. Cubuk,et al.  Improving Robustness Without Sacrificing Accuracy with Patch Gaussian Augmentation , 2019, ArXiv.

[32]  A. Emin Orhan,et al.  Robustness properties of Facebook's ResNeXt WSL models , 2019, ArXiv.

[33]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  S. Gelly,et al.  Big Transfer (BiT): General Visual Representation Learning , 2019, ECCV.

[35]  Lauren Wilcox,et al.  A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy , 2020, CHI.

[36]  M. Bethge,et al.  Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[37]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[38]  AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty , 2019, ICLR.

[39]  Tianlong Chen,et al.  I Am Going MAD: Maximum Discrepancy Competition for Comparing Classifiers Adaptively , 2020, ICLR.

[40]  J. Zico Kolter,et al.  Fast is better than free: Revisiting adversarial training , 2020, ICLR.

[41]  Matthias Bethge,et al.  A Simple Way to Make Neural Networks Robust Against Diverse Image Corruptions , 2020, ECCV.

[42]  Matthias Bethge,et al.  Increasing the robustness of DNNs against image corruptions by playing the Game of Noise , 2020, ArXiv.

[43]  Aleksander Madry,et al.  Identifying Statistical Bias in Dataset Replication , 2020, ICML.

[44]  Dawn Song,et al.  Pretrained Transformers Improve Out-of-Distribution Robustness , 2020, ACL.

[45]  Intriguing Properties of Adversarial Training at Scale , 2019, ICLR.

[46]  Jinwoo Shin,et al.  Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning , 2019, ICLR.

[47]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Kai Zhao,et al.  Res2Net: A New Multi-Scale Backbone Architecture , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Dawn Song,et al.  Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).