Mind the Backbone: Minimizing Backbone Distortion for Robust Object Detection

Building object detectors that are robust to domain shifts is critical for real-world applications. Prior approaches fine-tune a pre-trained backbone and risk overfitting it to in-distribution (ID) data and distorting features useful for out-of-distribution (OOD) generalization. We propose to use Relative Gradient Norm (RGN) as a way to measure the vulnerability of a backbone to feature distortion, and show that high RGN is indeed correlated with lower OOD performance. Our analysis of RGN yields interesting findings: some backbones lose OOD robustness during fine-tuning, but others gain robustness because their architecture prevents the parameters from changing too much from the initial model. Given these findings, we present recipes to boost OOD robustness for both types of backbones. Specifically, we investigate regularization and architectural choices for minimizing gradient updates so as to prevent the tuned backbone from losing generalizable features. Our proposed techniques complement each other and show substantial improvements over baselines on diverse architectures and datasets. Code is available at https://github.com/VisionLearningGroup/mind_back.

[1]  B. Schiele,et al.  Normalization Perturbation: A Simple Domain Generalization Method for Real-World Domain Shifts , 2022, ArXiv.

[2]  Annie S. Chen,et al.  Surgical Fine-Tuning Improves Adaptation to Distribution Shifts , 2022, ArXiv.

[3]  M. Chiaberge,et al.  Back-to-Bones: Rediscovering the Role of Backbones in Domain Generalization , 2022, ArXiv.

[4]  Gim Hee Lee,et al.  Adversarial Style Augmentation for Domain Generalized Urban-Scene Segmentation , 2022, NeurIPS.

[5]  P. Gallinari,et al.  Diverse Weight Averaging for Out-of-Distribution Generalization , 2022, NeurIPS.

[6]  Vincent Dumoulin,et al.  Proper Reuse of Image Classification Features Improves Object Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  S. Sclaroff,et al.  A Broad Study of Pre-training for Domain Generalization and Adaptation , 2022, ECCV.

[8]  Sungrae Park,et al.  Domain Generalization by Mutual-Information Regularization with Pre-trained Models , 2022, ECCV.

[9]  Ari S. Morcos,et al.  Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time , 2022, ICML.

[10]  Percy Liang,et al.  Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution , 2022, ICLR.

[11]  Trevor Darrell,et al.  A ConvNet for the 2020s , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Seyed Iman Mirzadeh,et al.  Wide Neural Networks Forget Less Catastrophically , 2021, ICML.

[13]  Jong Wook Kim,et al.  Robust fine-tuning of zero-shot models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Changhu Wang,et al.  Domain-Invariant Disentangled Network for Generalizable Object Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Qi Tian,et al.  A Fourier-based Framework for Domain Generalization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Thomas E. Huang,et al.  Robust Object Detection via Instance-Level Temporal Cycle Confusion , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Anima Anandkumar,et al.  Contrastive Syn-to-Real Generalization , 2021, ICLR.

[18]  Y. Qiao,et al.  Domain Generalization with MixStyle , 2021, ICLR.

[19]  Cho-Jui Hsieh,et al.  Robust and Accurate Object Detection via Adversarial Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Sungrae Park,et al.  SWAD: Domain Generalization by Seeking Flat Minima , 2021, NeurIPS.

[21]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.

[22]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  J. Gilmer,et al.  AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty , 2019, ICLR.

[24]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Alexander S. Ecker,et al.  Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming , 2019, ArXiv.

[28]  Ross B. Girshick,et al.  LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[30]  Quoc V. Le,et al.  NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Nuno Vasconcelos,et al.  Towards Universal Object Detection by Domain Attention , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Kaiming He,et al.  Group Normalization , 2018, International Journal of Computer Vision.

[33]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[34]  Kiyoharu Aizawa,et al.  Cross-Domain Weakly-Supervised Object Detection Through Progressive Domain Adaptation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Xuhong Li,et al.  Explicit Inductive Bias for Transfer Learning with Convolutional Networks , 2018, ICML.

[36]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Luc Van Gool,et al.  Semantic Foggy Scene Understanding with Synthetic Data , 2017, International Journal of Computer Vision.

[38]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[41]  Trevor Darrell,et al.  Best Practices for Fine-Tuning Visual Classifiers to New Domains , 2016, ECCV Workshops.

[42]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[47]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[48]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[50]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.