Exploiting Web Images for Fine-Grained Visual Recognition by Eliminating Noisy Samples and Utilizing Hard Ones

Labeling objects at a subordinate level typically requires expert knowledge, which is not always available when using random annotators. As such, learning directly from web images for fine-grained recognition has attracted broad attention. However, the presence of label noise and hard examples in web images are two obstacles for training robust fine-grained recognition models. Therefore, in this paper, we propose a novel approach for removing irrelevant samples from real-world web images during training, while employing useful hard examples to update the network. Thus, our approach can alleviate the harmful effects of irrelevant noisy web images and hard examples to achieve better performance. Extensive experiments on three commonly used fine-grained datasets demonstrate that our approach is far superior to current state-of-the-art web-supervised methods. The data and source code of this work have been made publicly available at: https://github.com/NUST-Machine-Intelligence-Laboratory/ Advanced-Softly-Update-Drop.

[1]  Dong Xu,et al.  Visual recognition by learning from web data: A weakly supervised domain generalization approach , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[4]  Jianfeng Lu,et al.  Hsi Road: A Hyper Spectral Image Dataset For Road Segmentation , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[5]  Tao Chen,et al.  Classification Constrained Discriminator For Domain Adaptive Semantic Segmentation , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[6]  Shai Shalev-Shwartz,et al.  Decoupling "when to update" from "how to update" , 2017, NIPS.

[7]  Yongshun Gong,et al.  Field-wise Learning for Multi-field Categorical Data , 2020, NeurIPS.

[8]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[9]  Dacheng Tao,et al.  Webly-Supervised Fine-Grained Visual Categorization via Deep Domain Adaptation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Xiu-Shen Wei,et al.  CRSSC: Salvage Reusable Samples from Noisy Data for Robust Learning , 2020, ACM Multimedia.

[11]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[12]  Haofeng Zhang,et al.  Set and Rebase: Determining the Semantic Graph Connectivity for Unsupervised Cross-Modal Hashing , 2020, IJCAI.

[13]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[14]  Ling Shao,et al.  Motion-Attentive Transition for Zero-Shot Video Object Segmentation , 2020, AAAI.

[15]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Xiu-Shen Wei,et al.  PyRetri: A PyTorch-based Library for Unsupervised Image Retrieval by Deep Convolutional Neural Networks , 2020, ACM Multimedia.

[17]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[19]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[20]  Kun Yi,et al.  Probabilistic End-To-End Noise Correction for Learning With Noisy Labels , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Xiu-Shen Wei,et al.  Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization , 2018, Pattern Recognit..

[22]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[24]  Ashok Veeraraghavan,et al.  Webly Supervised Learning Meets Zero-shot Learning: A Hybrid Approach for Fine-Grained Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Zechao Li,et al.  Data-driven Meta-set Based Fine-Grained Visual Recognition , 2020, ACM Multimedia.

[26]  Ya Zhang,et al.  Augmenting Strong Supervision Using Web Data for Fine-Grained Categorization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Michael Lam,et al.  Fine-Grained Recognition as HSnet Search for Informative Image Parts , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jian Zhang,et al.  Towards Automatic Construction of Diverse, High-Quality Image Datasets , 2017, IEEE Transactions on Knowledge and Data Engineering.

[29]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jian Cheng,et al.  Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.

[31]  Jiebo Luo,et al.  Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Le Song,et al.  Iterative Learning with Open-set Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Jian Zhang,et al.  Automatic image dataset construction with multiple textual metadata , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[34]  Feng Zhou,et al.  Fine-Grained Categorization and Dataset Bootstrapping Using Deep Metric Learning with Humans in the Loop , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ling Shao,et al.  Region Graph Embedding Network for Zero-Shot Learning , 2020, ECCV.

[36]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[37]  Ling Shao,et al.  Extracting Privileged Information for Enhancing Classifier Learning , 2019, IEEE Transactions on Image Processing.

[38]  Jian Zhang,et al.  A Domain Robust Approach For Image Dataset Construction , 2016, ACM Multimedia.

[39]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[40]  Xiaobo Jin,et al.  Attentive Region Embedding Network for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Yao Li,et al.  Attend in Groups: A Weakly-Supervised Deep Learning Framework for Learning from Web Data , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jian Zhang,et al.  Exploiting Web Images for Dataset Construction: A Domain Robust Approach , 2016, IEEE Transactions on Multimedia.

[43]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jianbing Shen,et al.  Target-Aware Adaptive Tracking for Unsupervised Video Object Segmentation , 2020 .

[45]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[46]  Guanyu Gao,et al.  Bridging the Web Data and Fine-Grained Visual Recognition via Alleviating Label Noise and Domain Mismatch , 2020, ACM Multimedia.

[47]  Ling Shao,et al.  Deep Unsupervised Self-Evolutionary Hashing for Image Retrieval , 2021, IEEE Transactions on Multimedia.

[48]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[49]  Yongdong Zhang,et al.  Coarse-to-Fine Description for Fine-Grained Visual Categorization , 2016, IEEE Transactions on Image Processing.

[50]  Ling Shao,et al.  Dynamically Visual Disambiguation of Keyword-based Image Search , 2019, IJCAI.

[51]  Jian Zhang,et al.  Web-Supervised Network for Fine-Grained Visual Classification , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[52]  Jian Zhang,et al.  Extracting Privileged Information from Untagged Corpora for Classifier Learning , 2018, IJCAI.

[53]  Jian Cheng,et al.  NormFace: L2 Hypersphere Embedding for Face Verification , 2017, ACM Multimedia.

[54]  Ya Zhang,et al.  Part-Stacked CNN for Fine-Grained Visual Categorization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Jian Zhang,et al.  Discovering and Distinguishing Multiple Visual Senses for Polysemous Words , 2018, AAAI.

[56]  Jian Zhang,et al.  Exploiting textual queries for dynamically visual disambiguation , 2021, Pattern Recognit..

[57]  Carlos D. Castillo,et al.  L2-constrained Softmax Loss for Discriminative Face Verification , 2017, ArXiv.

[58]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[59]  Ling Shao,et al.  Approximate Kernel Selection via Matrix Approximation , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[60]  Joachim Denzler,et al.  Classification-Specific Parts for Improving Fine-Grained Visual Categorization , 2019, GCPR.

[61]  Ling Shao,et al.  Extracting Multiple Visual Senses for Web Learning , 2019, IEEE Transactions on Multimedia.

[62]  Heng Tao Shen,et al.  Exploiting Web Images for Multi-Output Classification: From Category to Subcategories , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[63]  Zheng Zhang,et al.  Web-Supervised Network with Softly Update-Drop Training for Fine-Grained Visual Classification , 2020, AAAI.

[64]  Yu Liu,et al.  Learning Deep Features via Congenerous Cosine Loss for Person Recognition , 2017, ArXiv.

[65]  Yang Song,et al.  Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Tao Mei,et al.  Destruction and Construction Learning for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[68]  Guosheng Lin,et al.  SegEQA: Video Segmentation Based Visual Attention for Embodied Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[69]  Yizhou Yu,et al.  Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification From the Bottom Up , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[71]  Larry S. Davis,et al.  Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.