Structure-Aware Feature Generation for Zero-Shot Learning

Zero-Shot Learning (ZSL) targets at recognizing unseen categories by leveraging auxiliary information, such as attribute embedding. Despite the encouraging results achieved, prior ZSL approaches focus on improving the discriminant power of seen-class features, yet have largely overlooked the geometric structure of the samples and the prototypes. The subsequent attribute-based generative adversarial network (GAN), as a result, also neglects the topological information in sample generation and further yields inferior performances in classifying the visual features of unseen classes. In this paper, we introduce a novel structure-aware feature generation scheme, termed as SA-GAN, to explicitly account for the topological structure in learning both the latent space and the generative networks. Specifically, we introduce a constraint loss to preserve the initial geometric structure when learning a discriminative latent space, and carry out our GAN training with additional supervising signals from a structure-aware discriminator and a reconstruction module. The former supervision distinguishes fake and real samples based on their affinity to class prototypes, while the latter aims to reconstruct the original feature space from the generated latent space. This topology-preserving mechanism enables our method to significantly enhance the generalization capability on unseen-classes and consequently improve the classification performance. Experiments on four benchmarks demonstrate that the proposed approach consistently outperforms the state of the art. Our code can be found in the supplementary material and will also be made publicly available.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[3]  Ling Shao,et al.  Zero-VAE-GAN: Generating Unseen Features for Generalized and Transductive Zero-Shot Learning , 2020, IEEE Transactions on Image Processing.

[4]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[6]  Shaogang Gong,et al.  Semantic Autoencoder for Zero-Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ling Shao,et al.  A Joint Label Space for Generalized Zero-Shot Classification , 2020, IEEE Transactions on Image Processing.

[9]  Ngai-Man Cheung,et al.  Improving GAN with neighbors embedding and gradient matching , 2018, AAAI.

[10]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[11]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Bernt Schiele,et al.  Zero-Shot Learning — The Good, the Bad and the Ugly , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Zi Huang,et al.  Leveraging the Invariant Side of Generative Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[15]  Fahad Shahbaz Khan,et al.  Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification , 2020, ECCV.

[16]  Dongdong Chen,et al.  Transductive Zero-Shot Learning with Visual Structure Constraint , 2019, NeurIPS.

[17]  Tao Xiang,et al.  Learning a Deep Embedding Model for Zero-Shot Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jianwen Xie,et al.  Learning Feature-to-Feature Translator by Alternating Back-Propagation for Generative Zero-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Ngai-Man Cheung,et al.  Dist-GAN: An Improved GAN Using Distance Constraints , 2018, ECCV.

[20]  Liang Wang,et al.  Deep Unbiased Embedding Transfer for Zero-Shot Learning , 2020, IEEE Transactions on Image Processing.

[21]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Ling Shao,et al.  From Zero-Shot Learning to Conventional Supervised Classification: Unseen Visual Data Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[24]  Bernt Schiele,et al.  Feature Generating Networks for Zero-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Fatih Porikli,et al.  A Unified Approach for Conventional Zero-Shot, Generalized Zero-Shot, and Few-Shot Learning , 2017, IEEE Transactions on Image Processing.

[27]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[28]  Michael I. Jordan,et al.  Generalized Zero-Shot Learning with Deep Calibration Network , 2018, NeurIPS.

[29]  Narayanan C. Krishnan,et al.  Semantically Aligned Bias Reducing Zero Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Bernt Schiele,et al.  F-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Mayank Vatsa,et al.  Generalized Zero-Shot Learning via Over-Complete Distribution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jian Yang,et al.  Learning the Redundancy-Free Features for Generalized Zero-Shot Object Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[36]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[37]  Bernt Schiele,et al.  Learning Deep Representations of Fine-Grained Visual Descriptions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Shiguang Shan,et al.  Transferable Contrastive Network for Generalized Zero-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Yoshua Bengio,et al.  Zero-data Learning of New Tasks , 2008, AAAI.

[40]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[41]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[42]  Xiang Zhou,et al.  Scalable Zero-Shot Learning via Binary Visual-Semantic Embeddings , 2019, IEEE Transactions on Image Processing.

[43]  Cordelia Schmid,et al.  Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Yun Fu,et al.  Rethinking Zero-Shot Learning: A Conditional Visual Classification Perspective , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Frédéric Jurie,et al.  Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classiffication , 2016, ECCV.

[46]  Piyush Rai,et al.  A Simple Exponential Family Framework for Zero-Shot Learning , 2017, ECML/PKDD.

[47]  Ling Shao,et al.  Triple Verification Network for Generalized Zero-Shot Learning , 2019, IEEE Transactions on Image Processing.

[48]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[49]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[50]  Venkatesh Saligrama,et al.  Learning Joint Feature Adaptation for Zero-Shot Recognition , 2016, ArXiv.

[51]  Trevor Darrell,et al.  Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Piyush Rai,et al.  Generalized Zero-Shot Learning via Synthesized Examples , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[54]  Carl Doersch,et al.  Tutorial on Variational Autoencoders , 2016, ArXiv.

[55]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[56]  Ramazan Gokberk Cinbis,et al.  Gradient Matching Generative Networks for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[60]  Wei-Lun Chao,et al.  Synthesized Classifiers for Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Wei-Yun Yau,et al.  Deep Subspace Clustering with Sparsity Prior , 2016, IJCAI.