Bridging the Web Data and Fine-Grained Visual Recognition via Alleviating Label Noise and Domain Mismatch

To distinguish the subtle differences among fine-grained categories, a large amount of well-labeled images are typically required. However, manual annotations for fine-grained categories is an extremely difficult task as it usually has a high demand for professional knowledge. To this end, we propose to directly leverage web images for fine-grained visual recognition. Our work mainly focuses on two critical issues including "label noise" and "domain mismatch" in the web images. Specifically, we propose an end-to-end deep denoising network (DDN) model to jointly solve these problems in the process of web images selection. To verify the effectiveness of our proposed approach, we first collect web images by using the labels in fine-grained datasets. Then we apply the proposed deep denoising network model for noise removal and domain mismatch alleviation. We leverage the selected web images as the training set for fine-grained categorization models learning. Extensive experiments and ablation studies demonstrate state-of-the-art performance gained by our proposed approach, which, at the same time, delivers a new pipeline for fine-grained visual categorization that is to be highly effective for real-world applications.

[1]  Jian Zhang,et al.  A new web-supervised method for image dataset constructions , 2017, Neurocomputing.

[2]  Larry S. Davis,et al.  Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Jian Yang,et al.  Hierarchical Long Short-Term Concurrent Memory for Human Interaction Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[5]  Yizhou Yu,et al.  Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification From the Bottom Up , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Hanjiang Lai,et al.  Personalized Age Progression with Bi-Level Aging Dictionary Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Xiao Liu,et al.  Fully Convolutional Attention Localization Networks: Efficient Attention Localization for Fine-Grained Recognition , 2016, ArXiv.

[8]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[9]  Dacheng Tao,et al.  Webly-Supervised Fine-Grained Visual Categorization via Deep Domain Adaptation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Qi Tian,et al.  Fused one-vs-all mid-level features for fine-grained visual categorization , 2014, ACM Multimedia.

[12]  Ya Zhang,et al.  Part-Stacked CNN for Fine-Grained Visual Categorization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Arnold W. M. Smeulders,et al.  Local Alignments for Fine-Grained Categorization , 2014, International Journal of Computer Vision.

[14]  Yang Song,et al.  Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[16]  Xiao Liu,et al.  Fully Convolutional Attention Networks for Fine-Grained Recognition , 2016 .

[17]  Cewu Lu,et al.  Deep LAC: Deep localization, alignment and classification for fine-grained recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Tao Chen,et al.  Classification Constrained Discriminator For Domain Adaptive Semantic Segmentation , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[19]  Qi Tian,et al.  Picking Deep Filter Responses for Fine-Grained Image Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Pietro Perona,et al.  Bird Species Categorization Using Pose Normalized Deep Convolutional Nets , 2014, ArXiv.

[21]  Tony X. Han,et al.  Selective Pooling Vector for Fine-Grained Recognition , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[22]  Guosheng Lin,et al.  SegEQA: Video Segmentation Based Visual Attention for Embodied Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Jian Zhang,et al.  Extracting Privileged Information from Untagged Corpora for Classifier Learning , 2018, IJCAI.

[25]  Marcel Simon,et al.  Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Jianfeng Lu,et al.  Hsi Road: A Hyper Spectral Image Dataset For Road Segmentation , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[27]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Christopher Kanan,et al.  Fine-grained object recognition with Gnostic Fields , 2014, IEEE Winter Conference on Applications of Computer Vision.

[29]  Jian Zhang,et al.  Web-Supervised Network for Fine-Grained Visual Classification , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[30]  Xinlei Chen,et al.  NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  Wenyu Liu,et al.  Revisiting multiple instance neural networks , 2016, Pattern Recognit..

[32]  Pietro Perona,et al.  Improved Bird Species Recognition Using Pose Normalized Deep Convolutional Nets , 2014, BMVC.

[33]  Jian Zhang,et al.  A Domain Robust Approach For Image Dataset Construction , 2016, ACM Multimedia.

[34]  Ling Shao,et al.  Dynamically Visual Disambiguation of Keyword-based Image Search , 2019, IJCAI.

[35]  Ling Shao,et al.  Region Graph Embedding Network for Zero-Shot Learning , 2020, ECCV.

[36]  Fei-Fei Li,et al.  OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Jonathan Krause,et al.  The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition , 2015, ECCV.

[38]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Linda G. Shapiro,et al.  Unsupervised Template Learning for Fine-Grained Object Recognition , 2012, NIPS.

[40]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[41]  Ashok Veeraraghavan,et al.  Webly Supervised Learning Meets Zero-shot Learning: A Hybrid Approach for Fine-Grained Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Xiu-Shen Wei,et al.  Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization , 2018, Pattern Recognit..

[43]  Tianbao Yang,et al.  Hyper-class augmented and regularized deep learning for fine-grained image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Yuxin Peng,et al.  Fine-Grained Image Classification via Combining Vision and Language , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Heng Tao Shen,et al.  Exploiting Web Images for Multi-Output Classification: From Category to Subcategories , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[46]  Jian Zhang,et al.  Towards Automatic Construction of Diverse, High-Quality Image Datasets , 2017, IEEE Transactions on Knowledge and Data Engineering.

[47]  Shai Shalev-Shwartz,et al.  Decoupling "when to update" from "how to update" , 2017, NIPS.

[48]  Feng Zhou,et al.  Fine-Grained Categorization and Dataset Bootstrapping Using Deep Metric Learning with Humans in the Loop , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Jian Zhang,et al.  Exploiting Web Images for Dataset Construction: A Domain Robust Approach , 2016, IEEE Transactions on Multimedia.

[50]  Xiaobo Jin,et al.  Attentive Region Embedding Network for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Ling Shao,et al.  Extracting Privileged Information for Enhancing Classifier Learning , 2019, IEEE Transactions on Image Processing.

[52]  Ling Shao,et al.  Extracting Multiple Visual Senses for Web Learning , 2019, IEEE Transactions on Multimedia.

[53]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[54]  Jiebo Luo,et al.  Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Ivan Laptev,et al.  Weakly supervised object recognition with convolutional neural networks , 2014 .

[56]  Andrew Zisserman,et al.  Symbiotic Segmentation and Part Localization for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision.

[57]  Dong Xu,et al.  Visual recognition by learning from web data: A weakly supervised domain generalization approach , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Tao Mei,et al.  Destruction and Construction Learning for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[61]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Jian Zhang,et al.  Automatic image dataset construction with multiple textual metadata , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[63]  Bo Zhao,et al.  Diversified Visual Attention Networks for Fine-Grained Object Classification , 2016, IEEE Transactions on Multimedia.

[64]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[65]  Max Welling,et al.  Attention-based Deep Multiple Instance Learning , 2018, ICML.

[66]  Zheng Zhang,et al.  Web-Supervised Network with Softly Update-Drop Training for Fine-Grained Visual Classification , 2020, AAAI.

[67]  Trevor Darrell,et al.  Detector discovery in the wild: Joint multiple instance and representation learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Lorenzo Torresani,et al.  Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach , 2010, NIPS.

[69]  Jian Zhang,et al.  Discovering and Distinguishing Multiple Visual Senses for Polysemous Words , 2018, AAAI.