论文信息 - Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels

Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels

ImageNet has been the most popular image classification benchmark, but it is also the one with a significant level of label noise. Recent studies have shown that many samples contain multiple classes, despite being assumed to be a single-label benchmark. They have thus proposed to turn ImageNet evaluation into a multi-label task, with exhaustive multi-label annotations per image. However, they have not fixed the training set, presumably because of a formidable annotation cost. We argue that the mismatch between single-label annotations and effectively multi-label images is equally, if not more, problematic in the training setup, where random crops are applied. With the single-label annotations, a random crop of an image may contain an entirely different object from the ground truth, introducing noisy or even incorrect supervision during training. We thus re-label the ImageNet training set with multi-labels. We address the annotation cost barrier by letting a strong image classifier, trained on an extra source of data, generate the multi-labels. We utilize the pixel-wise multi-label predictions before the final pooling layer, in order to exploit the additional location-specific supervision signals. Training on the re-labeled samples results in improved model performances across the board. ResNet-50 attains the top-1 accuracy of 78.9% on ImageNet with our localized multi-labels, which can be further boosted to 80.2% with the CutMix regularization. We show that the models trained with localized multi-labels also outperforms the baselines on transfer learning to object detection and instance segmentation tasks, and various robustness benchmarks. The re-labeled ImageNet training set, pre-trained weights, and the source code are available at https://github.com/naverai/relabel_imagenet.

[1] Xiaohua Zhai,et al. Are we done with ImageNet? , 2020, ArXiv.

[2] Seong Joon Oh,et al. Slowing Down the Weight Norm Increase in Momentum-based Optimizers , 2020, ArXiv.

[3] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[5] Pietro Perona,et al. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Matthijs Douze,et al. Fixing the train-test resolution discrepancy , 2019, NeurIPS.

[8] Junmo Kim,et al. Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] C. V. Jawahar,et al. Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Kan Chen,et al. Billion-scale semi-supervised learning for image classification , 2019, ArXiv.

[11] Jin Young Choi,et al. Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons , 2018, AAAI.

[12] Sid Ying-Ze Bao,et al. Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification , 2020, AAAI.

[13] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Matthieu Guillaumin,et al. Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[15] Aleksander Madry,et al. From ImageNet to Image Classification: Contextualizing Progress on Benchmarks , 2020, ICML.

[16] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[17] Lucas Beyer,et al. Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.

[18] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[19] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[20] Liang Lin,et al. Multi-label Image Recognition by Recurrently Discovering Attentional Regions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21] Yan Lu,et al. Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Sangdoo Yun,et al. A Comprehensive Overhaul of Feature Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Subhransu Maji,et al. Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[24] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Jung-Woo Ha,et al. NSML: Meet the MLaaS platform with a real-world case study , 2018, ArXiv.

[28] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.

[29] Alan L. Yuille,et al. Snapshot Distillation: Teacher-Student Optimization in One Generation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Aleksander Madry,et al. Noise or Signal: The Role of Image Backgrounds in Object Recognition , 2020, ICLR.

[31] Dawn Song,et al. Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Yi Yang,et al. Random Erasing Data Augmentation , 2017, AAAI.

[33] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[34] Quoc V. Le,et al. DropBlock: A regularization method for convolutional networks , 2018, NeurIPS.

[35] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[36] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[37] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[38] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[39] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[40] Benjamin Recht,et al. Evaluating Machine Accuracy on ImageNet , 2020, ICML.

[41] Moustapha Cissé,et al. ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases , 2017, ECCV.

[42] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[45] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47] Yonglong Tian,et al. Contrastive Representation Distillation , 2019, ICLR.

[48] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[49] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[50] Zhiqiang Shen,et al. MEAL: Multi-Model Ensemble via Adversarial Learning , 2018, AAAI.

[51] Jonathan Krause,et al. Collecting a Large-scale Dataset of Fine-grained Cars , 2013 .

[52] Dongyoon Han,et al. ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network , 2020, ArXiv.

[53] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[54] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[55] Quoc V. Le,et al. Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[59] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[60] Xiu-Shen Wei,et al. Multi-Label Image Recognition With Graph Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61] Iasonas Kokkinos,et al. Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[62] Lihi Zelnik-Manor,et al. Asymmetric Loss For Multi-Label Classification , 2020, ArXiv.

[63] Nikos Komodakis,et al. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[64] Xu Lan,et al. Knowledge Distillation by On-the-Fly Native Ensemble , 2018, NeurIPS.