Web-Supervised Network for Fine-Grained Visual Classification

Fine-grained visual classification (FGVC) is a tough task due to its high annotation cost of the fine-grained subcategories. To build a large-scale dataset at low manual cost, straightforwardly learning from web images for FGVC has attracted broad attention. However, there exist two characteristics in the need of concerning for the web dataset: 1) Noisy images; 2) A large proportion of hard examples. In this paper, we propose a simple yet effective approach to deal with noisy images and hard examples during training. Our method is a pure web-supervised method for FGVC. Extensive experiments on three commonly used fine-grained datasets demonstrate that our approach is much superior to the state-of-the-art web-supervised methods. The data and source code of this work have been posted available at: https://github.com/NUST-Machine-Intelligence-Laboratory/WSNFG.

[1]  Tao Li,et al.  Collaborative representation based local discriminant projection for feature extraction , 2018, Digit. Signal Process..

[2]  Heng Tao Shen,et al.  Exploiting Web Images for Multi-Output Classification: From Category to Subcategories , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Ling Shao,et al.  Dynamically Visual Disambiguation of Keyword-based Image Search , 2019, IJCAI.

[4]  Ling Shao,et al.  Extracting Multiple Visual Senses for Web Learning , 2019, IEEE Transactions on Multimedia.

[5]  Ling Shao,et al.  Motion-Attentive Transition for Zero-Shot Video Object Segmentation , 2020, AAAI.

[6]  Jian Zhang,et al.  Towards Automatic Construction of Diverse, High-Quality Image Datasets , 2017, IEEE Transactions on Knowledge and Data Engineering.

[7]  Jian Zhang,et al.  A new web-supervised method for image dataset constructions , 2017, Neurocomputing.

[8]  Jian Cheng,et al.  Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.

[9]  Guosheng Lin,et al.  SegEQA: Video Segmentation Based Visual Attention for Embodied Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Jian Zhang,et al.  Extracting Privileged Information from Untagged Corpora for Classifier Learning , 2018, IJCAI.

[12]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jian Zhang,et al.  Exploiting Web Images for Dataset Construction: A Domain Robust Approach , 2016, IEEE Transactions on Multimedia.

[14]  Ling Shao,et al.  Approximate Kernel Selection via Matrix Approximation , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Xiaobo Jin,et al.  Attentive Region Embedding Network for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ling Shao,et al.  Extracting Privileged Information for Enhancing Classifier Learning , 2019, IEEE Transactions on Image Processing.

[17]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[18]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[19]  Jian Zhang,et al.  Automatic image dataset construction with multiple textual metadata , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[20]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ashok Veeraraghavan,et al.  Webly Supervised Learning Meets Zero-shot Learning: A Hybrid Approach for Fine-Grained Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Xiu-Shen Wei,et al.  Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization , 2018, Pattern Recognit..

[23]  Jian Zhang,et al.  Discovering and Distinguishing Multiple Visual Senses for Polysemous Words , 2018, AAAI.

[24]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Feng Zhou,et al.  Fine-Grained Categorization and Dataset Bootstrapping Using Deep Metric Learning with Humans in the Loop , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Joachim Denzler,et al.  Classification-Specific Parts for Improving Fine-Grained Visual Categorization , 2019, GCPR.

[27]  Larry S. Davis,et al.  Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Carlos D. Castillo,et al.  L2-constrained Softmax Loss for Discriminative Face Verification , 2017, ArXiv.

[29]  Dong Xu,et al.  Visual recognition by learning from web data: A weakly supervised domain generalization approach , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Shai Shalev-Shwartz,et al.  Decoupling "when to update" from "how to update" , 2017, NIPS.

[31]  Jian Zhang,et al.  A Domain Robust Approach For Image Dataset Construction , 2016, ACM Multimedia.

[32]  Yang Song,et al.  Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Zhenmin Tang,et al.  Exploiting textual and visual features for image categorization , 2019, Pattern Recognit. Lett..

[34]  Zheng Zhang,et al.  Web-Supervised Network with Softly Update-Drop Training for Fine-Grained Visual Classification , 2020, AAAI.

[35]  Dacheng Tao,et al.  Webly-Supervised Fine-Grained Visual Categorization via Deep Domain Adaptation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).