A Variational Autoencoder with Deep Embedding Model for Generalized Zero-Shot Learning

Generalized zero-shot learning (GZSL) is a challenging task that aims to recognize not only unseen classes unavailable during training, but also seen classes used at training stage. It is achieved by transferring knowledge from seen classes to unseen classes via a shared semantic space (e.g. attribute space). Most existing GZSL methods usually learn a cross-modal mapping between the visual feature space and the semantic space. However, the mapping model learned only from the seen classes will produce an inherent bias when used in the unseen classes. In order to tackle such a problem, this paper integrates a deep embedding network (DE) and a modified variational autoencoder (VAE) into a novel model (DE-VAE) to learn a latent space shared by both image features and class embeddings. Specifically, the proposed model firstly employs DE to learn the mapping from the semantic space to the visual feature space, and then utilizes VAE to transform both original visual features and the features obtained by the mapping into latent features. Finally, the latent features are used to train a softmax classifier. Extensive experiments on four GZSL benchmark datasets show that the proposed model significantly outperforms the state of the arts.