Fine-grained Classification using Heterogeneous Web Data and Auxiliary Categories

Fine-grained classification remains a very challenging problem, because of the absence of well-labeled training data caused by the high cost of annotating a large number of fine-grained categories. In the extreme case, given a set of test categories without any well-labeled training data, the majority of existing works can be grouped into the following two research directions: 1) crawl noisy labeled web data for the test categories as training data, which is dubbed as webly supervised learning; 2) transfer the knowledge from auxiliary categories with well-labeled training data to the test categories, which corresponds to zero-shot learning setting. Nevertheless, the above two research directions still have critical issues to be addressed. For the first direction, web data have noisy labels and considerably different data distribution from test data. For the second direction, zero-shot learning is struggling to achieve compelling results compared with conventional supervised learning. The issues of the above two directions motivate us to develop a novel approach which can jointly exploit both noisy web training data from test categories and well-labeled training data from auxiliary categories. In particular, on one hand, we crawl web data for test categories as noisy training data. On the other hand, we transfer the knowledge from auxiliary categories with well-labeled training data to test categories by virtue of free semantic information (e.g., word vector) of all categories. Moreover, given the fact that web data are generally associated with additional textual information (e.g., title and tag), we extend our method by using the surrounding textual information of web data as privileged information. Extensive experiments show the effectiveness of our proposed methods.

[1]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[2]  Xun Xu,et al.  Transductive Zero-Shot Action Recognition by Word-Vector Embedding , 2015, International Journal of Computer Vision.

[3]  Ivor W. Tsang,et al.  Transfer Hashing with Privileged Information , 2016, IJCAI.

[4]  Dong Xu,et al.  Distance Metric Learning Using Privileged Information for Face Verification and Person Re-Identification , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Ashok Veeraraghavan,et al.  Webly Supervised Learning Meets Zero-shot Learning: A Hybrid Approach for Fine-Grained Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Mahdieh Soleymani Baghshah,et al.  Semi-supervised Zero-Shot Learning by a Clustering-based Approach , 2016, ArXiv.

[7]  Dong Xu,et al.  Visual recognition by learning from web data: A weakly supervised domain generalization approach , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Dacheng Tao,et al.  Webly-Supervised Fine-Grained Visual Categorization via Deep Domain Adaptation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Shaogang Gong,et al.  Unsupervised Domain Adaptation for Zero-Shot Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[11]  Kate Saenko,et al.  Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[12]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Jianfei Cai,et al.  Zero-Shot Learning via Category-Specific Visual-Semantic Mapping and Label Refinement , 2019, IEEE Transactions on Image Processing.

[15]  Jianfei Cai,et al.  An Exemplar-Based Multi-View Domain Generalization Framework for Visual Recognition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[19]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[20]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[21]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Cordelia Schmid,et al.  Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Shaogang Gong,et al.  Recent Advances in Zero-Shot Recognition: Toward Data-Efficient Understanding of Visual Content , 2018, IEEE Signal Processing Magazine.

[25]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[26]  Lorenzo Torresani,et al.  Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach , 2010, NIPS.

[27]  Dong Xu,et al.  Exploiting Low-Rank Structure from Latent Domains for Domain Generalization , 2014, ECCV.

[28]  Yi Yang,et al.  Few-Shot Object Recognition from Machine-Labeled Web Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Xinlei Chen,et al.  NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[31]  Ya Zhang,et al.  Augmenting Strong Supervision Using Web Data for Fine-Grained Categorization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Jianfei Cai,et al.  Domain Adaptive Fisher Vector for Visual Recognition , 2016, ECCV.

[33]  Dong Xu,et al.  Exploiting Privileged Information from Web Data for Image Categorization , 2014, ECCV.

[34]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[35]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[36]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[38]  Uwe Aickelin,et al.  Privileged information for data clustering , 2012, Inf. Sci..

[39]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Wei-Lun Chao,et al.  Synthesized Classifiers for Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Christoph H. Lampert,et al.  Mind the Nuisance: Gaussian Process Classification using Privileged Noise , 2014, NIPS.

[42]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[43]  Vladimir Vapnik,et al.  A new learning paradigm: Learning using privileged information , 2009, Neural Networks.

[44]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[46]  Kristen Grauman,et al.  Zero-shot recognition with unreliable attributes , 2014, NIPS.

[47]  Xinlei Chen,et al.  Webly Supervised Learning of Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[48]  Wei-Lun Chao,et al.  An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild , 2016, ECCV.

[49]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[50]  Jianfei Cai,et al.  Visual Recognition by Learning From Web Data via Weakly Supervised Domain Generalization , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[51]  Jianfei Cai,et al.  MIML-FCN+: Multi-Instance Multi-Label Learning via Fully Convolutional Networks with Privileged Information , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Dale Schuurmans,et al.  Semi-Supervised Zero-Shot Classification with Label Representation Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[53]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Piyush Rai,et al.  Generalized Zero-Shot Learning via Synthesized Examples , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Joint Latent Similarity Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Bernt Schiele,et al.  Zero-Shot Learning — The Good, the Bad and the Ugly , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[59]  Christoph H. Lampert,et al.  Learning to Rank Using Privileged Information , 2013, 2013 IEEE International Conference on Computer Vision.

[60]  Dong Xu,et al.  Multi-view Domain Generalization for Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[61]  Bolei Zhou,et al.  Places: An Image Database for Deep Scene Understanding , 2016, ArXiv.

[62]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[63]  Yao Li,et al.  Attend in Groups: A Weakly-Supervised Deep Learning Framework for Learning from Web Data , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[65]  Hongguang Zhang,et al.  Zero-Shot Kernel Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Jonathan Krause,et al.  The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition , 2015, ECCV.

[67]  Lixin Duan,et al.  Action and Event Recognition in Videos by Learning From Heterogeneous Web Sources , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[68]  Ashok Veeraraghavan,et al.  Learning from Noisy Web Data with Category-Level Supervision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[69]  Ali Farhadi,et al.  Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Pietro Perona,et al.  Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).