A Simple Approach for Zero-Shot Learning based on Triplet Distribution Embeddings

Given the semantic descriptions of classes, Zero-Shot Learning (ZSL) aims to recognize unseen classes without labeled training data by exploiting semantic information, which contains knowledge between seen and unseen classes. Existing ZSL methods mainly use vectors to represent the embeddings to the semantic space. Despite the popularity, such vector representation limits the expressivity in terms of modeling the intra-class variability for each class. We address this issue by leveraging the use of distribution embeddings. More specifically, both image embeddings and class embeddings are modeled as Gaussian distributions, where their similarity relationships are preserved through the use of triplet constraints. The key intuition which guides our approach is that for each image, the embedding of the correct class label should be closer than that of any other class label. Extensive experiments on multiple benchmark data sets show that the proposed method achieves highly competitive results for both traditional ZSL and more challenging Generalized Zero-Shot Learning (GZSL) settings.

[1]  Henry J. Kelley,et al.  Gradient Theory of Optimal Flight Paths , 1960 .

[2]  C. Givens,et al.  A class of Wasserstein metrics for probability distributions. , 1984 .

[3]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[4]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[6]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[8]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[9]  Aram Kawewong,et al.  Online incremental attribute-based zero-shot learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[11]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[12]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Timothy M. Hospedales,et al.  Gaussian Visual-Linguistic Embedding for Zero-Shot Recognition , 2016, EMNLP.

[22]  Wei-Lun Chao,et al.  An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild , 2016, ECCV.

[23]  Cordelia Schmid,et al.  Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Wei-Lun Chao,et al.  Synthesized Classifiers for Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Bernt Schiele,et al.  Learning Deep Representations of Fine-Grained Visual Descriptions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Frédéric Jurie,et al.  Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classiffication , 2016, ECCV.

[28]  Shaogang Gong,et al.  Semantic Autoencoder for Zero-Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Ruslan Salakhutdinov,et al.  Learning Robust Visual-Semantic Embeddings , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Bernt Schiele,et al.  Gaze Embeddings for Zero-Shot Image Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[32]  Piyush Rai,et al.  A Simple Exponential Family Framework for Zero-Shot Learning , 2017, ECML/PKDD.

[33]  Tao Xiang,et al.  Learning a Deep Embedding Model for Zero-Shot Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[35]  Xi Peng,et al.  A Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Wei Liu,et al.  Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Piyush Rai,et al.  Generalized Zero-Shot Learning via Synthesized Examples , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Hema A. Murthy,et al.  A Generative Model for Zero Shot Learning Using Conditional Variational Autoencoders , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[39]  Bernt Schiele,et al.  Feature Generating Networks for Zero-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Trevor Darrell,et al.  Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Lina Yao,et al.  Zero-Shot Object Detection with Textual Descriptions , 2019, AAAI.

[42]  Toshihiko Yamasaki,et al.  Zero-Shot Semantic Segmentation via Variational Mapping , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[43]  Mohamed Elhoseiny,et al.  Creativity Inspired Zero-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Chunyan Miao,et al.  A Survey of Zero-Shot Learning , 2019, ACM Trans. Intell. Syst. Technol..

[45]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Yunlong Yu,et al.  Episode-Based Prototype Generating Network for Zero-Shot Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Lei Zhang,et al.  Towards Effective Deep Embedding for Zero-Shot Learning , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[48]  Piyush Rai,et al.  Meta-Learning for Generalized Zero-Shot Learning , 2020, AAAI.

[49]  Bernard De Baets,et al.  Improved deep embedding learning based on stochastic symmetric triplet loss and local sampling , 2020, Neurocomputing.

[50]  Piyush Rai,et al.  A Generative Framework for Zero-Shot Learning with Adversarial Domain Adaptation , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).