Adversarial unseen visual feature synthesis for Zero-shot Learning

Abstract Due to the extreme imbalance of training data between seen classes and unseen classes, most existing methods fail to achieve satisfactory results in the challenging task of Zero-shot Learning (ZSL). To avoid the need for labelled data of unseen classes, in this paper, we investigate how to synthesize visual features for ZSL problem. The key challenge is how to capture the realistic feature distribution of unseen classes without training samples. To this end, we propose a hybrid model consists of Random Attribute Selection (RAS) and conditional Generative Adversarial Network (cGAN). RAS aims to learn the realistic generation of attributes by their correlations in nature. To improve the discrimination for the large number of classes, we add a reconstruction loss in the generative network, which can solve the domain shift problem and significantly improve the classification accuracy. Extensive experiments on four benchmarks demonstrate that our method can outperform all the state-of-the-art methods. Qualitative results show that, compared to conventional generative models, our method can capture more realistic distribution and remarkably improve the variability of the synthesized data.

[1]  Bernt Schiele,et al.  Learning What and Where to Draw , 2016, NIPS.

[2]  Ping Tan,et al.  DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Ling Shao,et al.  From Zero-Shot Learning to Conventional Supervised Classification: Unseen Visual Data Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[5]  Anton van den Hengel,et al.  Less is More: Zero-Shot Learning from Online Textual Documents with Noise Suppression , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[7]  Zhongfei Zhang,et al.  Manifold regularized cross-modal embedding for zero-shot learning , 2017, Inf. Sci..

[8]  Wei-Lun Chao,et al.  Synthesized Classifiers for Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[10]  Zhongfei Zhang,et al.  Zero-Shot Learning via Latent Space Encoding , 2017, IEEE Transactions on Cybernetics.

[11]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[12]  Bernt Schiele,et al.  Feature Generating Networks for Zero-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  David A. Forsyth,et al.  Describing objects by their attributes , 2009, CVPR.

[14]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[15]  Nazli Ikizler-Cinbis,et al.  Attributes2Classname: A Discriminative Model for Attribute-Based Unsupervised Zero-Shot Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[17]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[19]  Yue Gao,et al.  Synthesizing Samples for Zero-shot Learning , 2017, IJCAI.

[20]  Ling Shao,et al.  Attribute Embedding with Visual-Semantic Ambiguity Removal for Zero-shot Learning , 2016, BMVC.

[21]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[22]  Shaogang Gong,et al.  Unsupervised Domain Adaptation for Zero-Shot Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Shaogang Gong,et al.  Semantic Autoencoder for Zero-Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Bernt Schiele,et al.  Zero-Shot Learning — The Good, the Bad and the Ugly , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Ling Shao,et al.  Dual-verification network for zero-shot learning , 2019, Inf. Sci..

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Cordelia Schmid,et al.  Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[29]  Xi Peng,et al.  A Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[31]  Shubham Pachori,et al.  Hashing in the zero shot framework with domain adaptation , 2017, Neurocomputing.

[32]  Tao Xiang,et al.  Zero-Shot Learning on Semantic Class Prototype Graph , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Yang Yang,et al.  Zero-Shot Hashing via Transferring Supervised Knowledge , 2016, ACM Multimedia.

[34]  Jichang Guo,et al.  Transductive Zero-Shot Learning With Adaptive Structural Embedding , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Jianmin Wang,et al.  Transductive Zero-Shot Recognition via Shared Model Space Learning , 2016, AAAI.

[37]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Chen Xu,et al.  The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding , 2014, International Journal of Computer Vision.

[39]  Jichang Guo,et al.  Zero-shot learning with regularized cross-modality ranking , 2017, Neurocomputing.

[40]  Chuang Gan,et al.  Recurrent Topic-Transition GAN for Visual Paragraph Generation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Piyush Rai,et al.  Generalized Zero-Shot Learning via Synthesized Examples , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Ling Shao,et al.  Describing Unseen Classes by Exemplars: Zero-Shot Learning Using Grouped Simile Ensemble , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[44]  Wei Liu,et al.  Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Shaogang Gong,et al.  Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation , 2014, ECCV.

[47]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[48]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[49]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Ling Shao,et al.  Zero-shot Hashing with orthogonal projection for image retrieval , 2019, Pattern Recognit. Lett..

[51]  Daoqiang Zhang,et al.  Attribute relation learning for zero-shot classification , 2014, Neurocomputing.

[52]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[53]  Xiao Li,et al.  Zero-shot classification by transferring knowledge and preserving data structure , 2017, Neurocomputing.