ResNet strikes back: An improved training procedure in timm

The influential Residual Networks designed by He et al. remain the gold-standard architecture in numerous scientific publications. They typically serve as the default architecture in studies, or as baselines when new architectures are proposed. Yet there has been significant progress on best practices for training neural networks since the inception of the ResNet architecture in 2015. Novel optimization & dataaugmentation have increased the effectiveness of the training recipes. In this paper, we re-evaluate the performance of the vanilla ResNet-50 when trained with a procedure that integrates such advances. We share competitive training settings and pre-trained models in the timm open-source library, with the hope that they will serve as better baselines for future work. For instance, with our more demanding training setting, a vanilla ResNet-50 reaches 80.4% top-1 accuracy at resolution 224×224 on ImageNet-val without extra data or distillation. We also report the performance achieved with popular models with our training procedure.

[1]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Kaiming He,et al.  Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Jonathon Shlens,et al.  Scaling Local Self-Attention for Parameter Efficient Visual Backbones , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Frank Hutter,et al.  Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[6]  Shuicheng Yan,et al.  Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.

[7]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[10]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[11]  Alexander Kolesnikov,et al.  MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.

[12]  K. Simonyan,et al.  High-Performance Large-Scale Image Recognition Without Normalization , 2021, ICML.

[13]  Kan Chen,et al.  Billion-scale semi-supervised learning for image classification , 2019, ArXiv.

[14]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[15]  James Demmel,et al.  Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.

[16]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[17]  Ekin D. Cubuk,et al.  Revisiting ResNets: Improved Training and Scaling Strategies , 2021, NeurIPS.

[18]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[19]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[20]  Zhi Zhang,et al.  Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Mingxing Tan,et al.  EfficientNetV2: Smaller Models and Faster Training , 2021, ICML.

[22]  Iasonas Kokkinos,et al.  MultiGrain: a unified image embedding for classes and instances , 2019, ArXiv.

[23]  Kiho Hong,et al.  Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network , 2020, ArXiv.

[24]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[26]  Balaji Lakshminarayanan,et al.  AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty , 2020, ICLR.

[27]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[28]  Matthieu Cord,et al.  Grafit: Learning fine-grained image representations with coarse labels , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Yang Song,et al.  The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Matthieu Cord,et al.  ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Itamar Friedman,et al.  TResNet: High Performance GPU-Dedicated Architecture , 2021, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  Andrew Gordon Wilson,et al.  Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[33]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[35]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[36]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[37]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[39]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[40]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[41]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Ross B. Girshick,et al.  Fast and Accurate Model Scaling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[45]  Xiaohua Zhai,et al.  Are we done with ImageNet? , 2020, ArXiv.

[46]  Matthieu Cord,et al.  Going deeper with Image Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Matthijs Douze,et al.  Fixing the train-test resolution discrepancy , 2019, NeurIPS.

[48]  David Picard,et al.  Torch.manual_seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision , 2021, ArXiv.

[49]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[50]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[51]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[52]  Torsten Hoefler,et al.  Augment Your Batch: Improving Generalization Through Instance Repetition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[54]  Hossein Mobahi,et al.  Sharpness-Aware Minimization for Efficiently Improving Generalization , 2020, ArXiv.