论文信息 - Grafit: Learning fine-grained image representations with coarse labels

Grafit: Learning fine-grained image representations with coarse labels

This paper tackles the problem of learning a finer representation than the one provided by training labels. This enables fine-grained category retrieval of images in a collection annotated with coarse labels only. Our network is learned with a nearest-neighbor classifier objective, and an instance loss inspired by self-supervised learning. By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods. Our strategy outperforms all competing methods for retrieving or classifying images at a finer granularity than that available at train time. It also improves the accuracy for transfer learning tasks to fine-grained datasets, thereby establishing the new state of the art on five public benchmarks, like iNaturalist-2018.

[1] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Quoc V. Le,et al. Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[3] Gert R. G. Lanckriet,et al. From region similarity to category discovery , 2011, CVPR 2011.

[4] Armand Joulin,et al. Unsupervised Learning by Predicting Noise , 2017, ICML.

[5] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[6] Kaiming He,et al. Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Tianbao Yang,et al. Hyper-class augmented and regularized deep learning for fine-grained image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Haibin Ling,et al. Feature Space Augmentation for Long-Tailed Data , 2020, ECCV.

[9] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[10] Mark Tygert,et al. A hierarchical loss and its problems when classifying non-hierarchically , 2017, PloS one.

[11] David Berthelot,et al. MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[12] Patrick Pérez,et al. Unsupervised Image Matching and Object Discovery as Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Luc Van Gool,et al. SCAN: Learning to Classify Images Without Labels , 2020, ECCV.

[14] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[15] Julien Mairal,et al. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[16] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17] Ser-Nam Lim,et al. Measuring Dataset Granularity , 2019, ArXiv.

[18] Michal Valko,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[19] Matthijs Douze,et al. Fixing the train-test resolution discrepancy , 2019, NeurIPS.

[20] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Matthieu Guillaumin,et al. From categories to subcategories: Large-scale image classification with partial class label refinement , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Jordi Pont-Tuset,et al. The Open Images Dataset V4 , 2018, International Journal of Computer Vision.

[23] Alexander D'Amour,et al. Underspecification Presents Challenges for Credibility in Modern Machine Learning , 2020, J. Mach. Learn. Res..

[24] David A. Shamma,et al. YFCC100M , 2015, Commun. ACM.

[25] Yair Movshovitz-Attias,et al. No Fuss Distance Metric Learning Using Proxies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26] Manohar Paluri,et al. Metric Learning with Adaptive Density Discrimination , 2015, ICLR.

[27] Xu Ji,et al. Invariant Information Clustering for Unsupervised Image Classification and Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28] Zhaolei Zhang,et al. Deep Supervised t-Distributed Embedding , 2010, ICML.

[29] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[30] S. Boucheron,et al. Theory of classification : a survey of some recent advances , 2005 .

[31] Geoffrey E. Hinton,et al. Neighbourhood Components Analysis , 2004, NIPS.

[32] Alexander Kolesnikov,et al. S4L: Self-Supervised Semi-Supervised Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33] Yu Liu,et al. CNN-RNN: a large-scale hierarchical image classification framework , 2018, Multimedia Tools and Applications.

[34] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[35] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[36] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38] Matthijs Douze,et al. Fixing the train-test resolution discrepancy: FixEfficientNet , 2020, ArXiv.

[39] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[40] Cordelia Schmid,et al. Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Abhinav Gupta,et al. ClusterFit: Improving Generalization of Visual Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Nasser M. Nasrabadi,et al. A Weakly Supervised Fine Label Classifier Enhanced by Coarse Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44] Geoffrey E. Hinton,et al. Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[45] Zhi Zhang,et al. Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46] David Berthelot,et al. ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring , 2019, ArXiv.

[47] Alexei A. Efros,et al. What makes ImageNet good for transfer learning? , 2016, ArXiv.

[48] Jonathan Krause,et al. Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[49] David Stutz,et al. Neural Codes for Image Retrieval , 2015 .

[50] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[51] Yang Song,et al. Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52] Matthieu Cord,et al. Learning Representations by Predicting Bags of Visual Words , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[54] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[55] Alexei A. Efros,et al. Improving Generalization via Scalable Neighborhood Component Analysis , 2018, ECCV.

[56] Matthieu Guillaumin,et al. Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[57] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[58] Massoud Pedram,et al. Coarse2Fine: A Two-stage Training Method for Fine-grained Visual Classification , 2019, ArXiv.

[59] Graham W. Taylor,et al. ProxyNCA++: Revisiting and Revitalizing Proxy Neighborhood Component Analysis , 2020, ECCV.

[60] Iasonas Kokkinos,et al. MultiGrain: a unified image embedding for classes and instances , 2019, ArXiv.

[61] David Berthelot,et al. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[62] Zsolt Kira,et al. Deep Image Category Discovery using a Transferred Similarity Function , 2016, ArXiv.

[63] Jean Ponce,et al. Toward unsupervised, multi-object discovery in large-scale image collections , 2020, ECCV.