Data-driven Meta-set Based Fine-Grained Visual Recognition

Constructing fine-grained image datasets typically requires domain-specific expert knowledge, which is not always available for crowd-sourcing platform annotators. Accordingly, learning directly from web images becomes an alternative method for fine-grained visual recognition. However, label noise in the web training set can severely degrade the model performance. To this end, we propose a data-driven meta-set based approach to deal with noisy web images for fine-grained recognition. Specifically, guided by a small amount of clean meta-set, we train a selection net in a meta-learning manner to distinguish in- and out-of-distribution noisy images. To further boost the robustness of the model, we also learn a labeling net to correct the labels of in-distribution noisy data. In this way, our proposed method can alleviate the harmful effects caused by out-of-distribution noise and properly exploit the in-distribution noisy samples for training. Extensive experiments on three commonly used fine-grained datasets demonstrate that our approach is much superior to state-of-the-art noise-robust methods.

[1]  Guosheng Lin,et al.  SegEQA: Video Segmentation Based Visual Attention for Embodied Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Jian Zhang,et al.  A new web-supervised method for image dataset constructions , 2017, Neurocomputing.

[4]  Ling Shao,et al.  Dynamically Visual Disambiguation of Keyword-based Image Search , 2019, IJCAI.

[5]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[6]  Jian Yang,et al.  Hierarchical Long Short-Term Concurrent Memory for Human Interaction Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Joachim Denzler,et al.  Classification-Specific Parts for Improving Fine-Grained Visual Categorization , 2019, GCPR.

[8]  Jianfei Cai,et al.  Weakly Supervised Fine-Grained Categorization With Part-Based Image Representation , 2016, IEEE Transactions on Image Processing.

[9]  Yizhou Yu,et al.  Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification From the Bottom Up , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Dong Xu,et al.  Visual recognition by learning from web data: A weakly supervised domain generalization approach , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jian Zhang,et al.  Extracting Privileged Information from Untagged Corpora for Classifier Learning , 2018, IJCAI.

[12]  Jonathan Krause,et al.  The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition , 2015, ECCV.

[13]  Pietro Perona,et al.  Bird Species Categorization Using Pose Normalized Deep Convolutional Nets , 2014, ArXiv.

[14]  Larry S. Davis,et al.  Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Yongdong Zhang,et al.  Coarse-to-Fine Description for Fine-Grained Visual Categorization , 2016, IEEE Transactions on Image Processing.

[16]  Heng Tao Shen,et al.  Exploiting Web Images for Multi-Output Classification: From Category to Subcategories , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Tao Chen,et al.  Classification Constrained Discriminator For Domain Adaptive Semantic Segmentation , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[18]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[19]  Feng Zhou,et al.  Fine-Grained Categorization and Dataset Bootstrapping Using Deep Metric Learning with Humans in the Loop , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Hanjiang Lai,et al.  Personalized Age Progression with Bi-Level Aging Dictionary Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[22]  Michael Lam,et al.  Fine-Grained Recognition as HSnet Search for Informative Image Parts , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ling Shao,et al.  Extracting Multiple Visual Senses for Web Learning , 2019, IEEE Transactions on Multimedia.

[24]  Yuxin Peng,et al.  Object-Part Attention Model for Fine-Grained Image Classification , 2017, IEEE Transactions on Image Processing.

[25]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[26]  Pietro Perona,et al.  Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[28]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[29]  Jiebo Luo,et al.  Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jian Zhang,et al.  Automatic image dataset construction with multiple textual metadata , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[31]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[32]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[33]  Tao Mei,et al.  Deep Collaborative Embedding for Social Image Understanding , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jian Zhang,et al.  Towards Automatic Construction of Diverse, High-Quality Image Datasets , 2017, IEEE Transactions on Knowledge and Data Engineering.

[36]  Shai Shalev-Shwartz,et al.  Decoupling "when to update" from "how to update" , 2017, NIPS.

[37]  Tao Mei,et al.  Destruction and Construction Learning for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Jian Zhang,et al.  Exploiting Web Images for Dataset Construction: A Domain Robust Approach , 2016, IEEE Transactions on Multimedia.

[40]  Xiaobo Jin,et al.  Attentive Region Embedding Network for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[42]  Ashok Veeraraghavan,et al.  Webly Supervised Learning Meets Zero-shot Learning: A Hybrid Approach for Fine-Grained Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Xiu-Shen Wei,et al.  Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization , 2018, Pattern Recognit..

[44]  Zheng Zhang,et al.  Web-Supervised Network with Softly Update-Drop Training for Fine-Grained Visual Classification , 2020, AAAI.

[45]  Ling Shao,et al.  Extracting Privileged Information for Enhancing Classifier Learning , 2019, IEEE Transactions on Image Processing.

[46]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Jinhui Tang,et al.  Weakly-supervised Semantic Guided Hashing for Social Image Retrieval , 2020, International Journal of Computer Vision.

[48]  Jae-Gil Lee,et al.  SELFIE: Refurbishing Unclean Samples for Robust Deep Learning , 2019, ICML.

[49]  Qi Tian,et al.  Picking Deep Filter Responses for Fine-Grained Image Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Ya Zhang,et al.  Part-Stacked CNN for Fine-Grained Visual Categorization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Kevin Gimpel,et al.  Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.

[52]  Jian Zhang,et al.  A Domain Robust Approach For Image Dataset Construction , 2016, ACM Multimedia.

[53]  Tianbao Yang,et al.  Hyper-class augmented and regularized deep learning for fine-grained image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Yuxin Peng,et al.  Fine-Grained Image Classification via Combining Vision and Language , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[56]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[57]  Jian Zhang,et al.  Web-Supervised Network for Fine-Grained Visual Classification , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[58]  Ya Zhang,et al.  Augmenting Strong Supervision Using Web Data for Fine-Grained Categorization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[59]  Ling Shao,et al.  Region Graph Embedding Network for Zero-Shot Learning , 2020, ECCV.

[60]  Jianfeng Lu,et al.  Hsi Road: A Hyper Spectral Image Dataset For Road Segmentation , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[61]  Jian Zhang,et al.  Discovering and Distinguishing Multiple Visual Senses for Polysemous Words , 2018, AAAI.

[62]  Dacheng Tao,et al.  Webly-Supervised Fine-Grained Visual Categorization via Deep Domain Adaptation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Qi Xie,et al.  Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.

[64]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[65]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).