Zero-VAE-GAN: Generating Unseen Features for Generalized and Transductive Zero-Shot Learning

Zero-shot learning (ZSL) is a challenging task due to the lack of unseen class data during training. Existing works attempt to establish a mapping between the visual and class spaces through a common intermediate semantic space. The main limitation of existing methods is the strong bias towards seen class, known as the domain shift problem, which leads to unsatisfactory performance in both conventional and generalized ZSL tasks. To tackle this challenge, we propose to convert ZSL to the conventional supervised learning by generating features for unseen classes. To this end, a joint generative model that couples variational autoencoder (VAE) and generative adversarial network (GAN), called Zero-VAE-GAN, is proposed to generate high-quality unseen features. To enhance the class-level discriminability, an adversarial categorization network is incorporated into the joint framework. Besides, we propose two self-training strategies to augment unlabeled unseen features for the transductive extension of our model, addressing the domain shift problem to a large extent. Experimental results on five standard benchmarks and a large-scale dataset demonstrate the superiority of our generative model over the state-of-the-art methods for conventional, especially generalized ZSL tasks. Moreover, the further improvement of the transductive setting demonstrates the effectiveness of the proposed self-training strategies.

[1]  Ling Shao,et al.  Adversarial unseen visual feature synthesis for Zero-shot Learning , 2019, Neurocomputing.

[2]  Jichang Guo,et al.  Transductive Zero-Shot Learning With Adaptive Structural Embedding , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[4]  Bernt Schiele,et al.  Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[5]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Bingbing Ni,et al.  Zero-Shot Action Recognition with Error-Correcting Output Codes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[8]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Cordelia Schmid,et al.  Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[12]  Gang Hua,et al.  CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Ling Shao,et al.  Beyond Semantic Attributes: Discrete Latent Attributes Learning for Zero-Shot Recognition , 2016, IEEE Signal Processing Letters.

[15]  Lei Zhang,et al.  Towards Effective Deep Embedding for Zero-Shot Learning , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[17]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[18]  Ke Chen,et al.  Zero-Shot Visual Recognition via Bidirectional Latent Embedding , 2016, International Journal of Computer Vision.

[19]  Kristen Grauman,et al.  Zero-shot recognition with unreliable attributes , 2014, NIPS.

[20]  Ling Shao,et al.  Zero-shot leaning and hashing with binary visual similes , 2018, Multimedia Tools and Applications.

[21]  Shaogang Gong,et al.  Zero-shot object recognition by semantic manifold distance , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Bin Tong,et al.  Adversarial Zero-shot Learning With Semantic Augmentation , 2018, AAAI.

[23]  Ling Shao,et al.  Generative Domain-Migration Hashing for Sketch-to-Image Retrieval , 2018, ECCV.

[24]  Kai Fan,et al.  Zero-Shot Learning via Class-Conditioned Deep Generative Models , 2017, AAAI.

[25]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[26]  Sanja Fidler,et al.  Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Jianmin Wang,et al.  Transductive Zero-Shot Recognition via Shared Model Space Learning , 2016, AAAI.

[28]  Wei-Lun Chao,et al.  Synthesized Classifiers for Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yang Yang,et al.  Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.

[30]  Yuhong Guo,et al.  Progressive Ensemble Networks for Zero-Shot Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ling Shao,et al.  Zero-Shot Learning Using Synthesised Unseen Visual Data with Diffusion Regularisation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Yi Fang,et al.  Deep Cross-modality Adaptation via Semantics Preserving Adversarial Learning for Sketch-based 3D Shape Retrieval , 2018, ECCV.

[33]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[34]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.

[35]  Tao Xiang,et al.  Learning a Deep Embedding Model for Zero-Shot Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[37]  Yuji Matsumoto,et al.  Ridge Regression, Hubness, and Zero-Shot Learning , 2015, ECML/PKDD.

[38]  Ramazan Gokberk Cinbis,et al.  Gradient Matching Generative Networks for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Wei-Lun Chao,et al.  An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild , 2016, ECCV.

[41]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Dongdong Chen,et al.  Transductive Zero-Shot Learning with Visual Structure Constraint , 2019, NeurIPS.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[45]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[46]  Yongxin Yang,et al.  A Unified Perspective on Multi-Domain and Multi-Task Learning , 2014, ICLR.

[47]  Yanwei Fu,et al.  Semi-supervised Vocabulary-Informed Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Shaogang Gong,et al.  Unsupervised Domain Adaptation for Zero-Shot Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49]  Shaogang Gong,et al.  Semantic Autoencoder for Zero-Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[51]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[52]  Li Liu,et al.  A Joint Generative Model for Zero-Shot Learning , 2018, ECCV Workshops.

[53]  Yang Liu,et al.  Transductive Unbiased Embedding for Zero-Shot Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Frédéric Jurie,et al.  Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classiffication , 2016, ECCV.

[55]  Xiaobo Jin,et al.  Attentive Region Embedding Network for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[57]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[58]  Yue Gao,et al.  Zero-Shot Learning With Transferred Samples , 2017, IEEE Transactions on Image Processing.

[59]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[60]  Frédéric Jurie,et al.  Generating Visual Representations for Zero-Shot Classification , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[61]  Fatih Porikli,et al.  A Unified Approach for Conventional Zero-Shot, Generalized Zero-Shot, and Few-Shot Learning , 2017, IEEE Transactions on Image Processing.

[62]  Shuicheng Yan,et al.  SDE: A Novel Selective, Discriminative and Equalizing Feature Representation for Visual Recognition , 2017, International Journal of Computer Vision.

[63]  Andrew Zisserman,et al.  Learning Visual Attributes , 2007, NIPS.

[64]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[65]  Soma Biswas,et al.  Preserving Semantic Relations for Zero-Shot Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[67]  Bernt Schiele,et al.  Feature Generating Networks for Zero-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[68]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[69]  Shaogang Gong,et al.  Attribute Learning for Understanding Unstructured Social Activity , 2012, ECCV.

[70]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[71]  Yoshua Bengio,et al.  Zero-data Learning of New Tasks , 2008, AAAI.

[72]  Venkatesh Saligrama,et al.  Learning Joint Feature Adaptation for Zero-Shot Recognition , 2016, ArXiv.

[73]  Ling Shao,et al.  From Zero-Shot Learning to Conventional Supervised Classification: Unseen Visual Data Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[75]  Min Yang,et al.  Transductive Zero-Shot Learning via Visual Center Adaptation , 2019, AAAI.

[76]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[77]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[79]  Ling Shao,et al.  Triple Verification Network for Generalized Zero-Shot Learning , 2019, IEEE Transactions on Image Processing.

[80]  Jiechao Guan,et al.  Domain-Invariant Projection Learning for Zero-Shot Recognition , 2018, NeurIPS.

[81]  Venkatesh Saligrama,et al.  Zero-Shot Recognition via Structured Prediction , 2016, ECCV.

[82]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[84]  Xiang Zhou,et al.  Scalable Zero-Shot Learning via Binary Visual-Semantic Embeddings , 2019, IEEE Transactions on Image Processing.

[85]  Li Liu,et al.  Towards Fine-Grained Open Zero-Shot Learning: Inferring Unseen Visual Features from Attributes , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[86]  Piyush Rai,et al.  Generalized Zero-Shot Learning via Synthesized Examples , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[87]  Xiaodong Yu,et al.  Attribute-Based Transfer Learning for Object Categorization with Zero/One Training Example , 2010, ECCV.

[88]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Joint Latent Similarity Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Ling Shao,et al.  Towards Affordable Semantic Searching: Zero-Shot Retrieval via Dominant Attributes , 2018, AAAI.

[90]  Zhongfei Zhang,et al.  Transductive Zero-Shot Learning With a Self-Training Dictionary Approach , 2017, IEEE Transactions on Cybernetics.

[91]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).