Towards Effective Deep Embedding for Zero-Shot Learning

Zero-shot learning (ZSL) can be formulated as a cross-domain matching problem: after being projected into a joint embedding space, a visual sample will match against all candidate class-level semantic descriptions and be assigned to the nearest class. In this process, the embedding space underpins the success of such matching and is crucial for ZSL. In this paper, we conduct an in-depth study on the construction of embedding space for ZSL and posit that an ideal embedding space should satisfy two criteria: intra-class compactness and inter-class separability. While the former encourages the embeddings of visual samples of one class to distribute tightly close to the semantic description embedding of this class, the latter requires embeddings from different classes to be well separated from each other. Towards this goal, we present a simple but effective two-branch network to simultaneously map semantic descriptions and visual samples into a joint space, on which visual embeddings are forced to regress to their class-level semantic embeddings and the embeddings crossing classes are required to be distinguishable by a trainable classifier. Furthermore, we extend our method to a transductive setting to better handle the model bias problem in ZSL (i.e., samples from unseen classes tend to be categorized into seen classes) with minimal extra supervision. Specifically, we propose a pseudo labeling strategy to progressively incorporate the testing samples into the training process and thus balance the model between seen and unseen classes. Experimental results on five standard ZSL datasets show the superior performance of the proposed method and its transductive extension.

[1]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[2]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Pietro Perona,et al.  Multiclass recognition and part localization with humans in the loop , 2011, 2011 International Conference on Computer Vision.

[4]  Anton van den Hengel,et al.  Less is More: Zero-Shot Learning from Online Textual Documents with Noise Suppression , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Tao Xiang,et al.  Learning a Deep Embedding Model for Zero-Shot Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yuji Matsumoto,et al.  Ridge Regression, Hubness, and Zero-Shot Learning , 2015, ECML/PKDD.

[8]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Joint Latent Similarity Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[11]  H LampertChristoph,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014 .

[12]  Xiaolin Hu,et al.  Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yang Liu,et al.  Transductive Unbiased Embedding for Zero-Shot Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Yanwei Fu,et al.  Semi-supervised Vocabulary-Informed Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Shaogang Gong,et al.  Unsupervised Domain Adaptation for Zero-Shot Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  T. Fearn Ridge Regression , 2013 .

[17]  Shaogang Gong,et al.  Semantic Autoencoder for Zero-Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Sanja Fidler,et al.  Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Wei-Lun Chao,et al.  Synthesized Classifiers for Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[21]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[22]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[25]  Soma Biswas,et al.  Preserving Semantic Relations for Zero-Shot Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  XiangTao,et al.  Transductive Multi-View Zero-Shot Learning , 2015 .

[27]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Wei-Lun Chao,et al.  An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild , 2016, ECCV.

[31]  Lei Zhang,et al.  Unsupervised Domain Adaptation Using Robust Class-Wise Matching , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[33]  Cordelia Schmid,et al.  Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Robert Pless,et al.  Deep Feature Interpolation for Image Content Changes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Shaogang Gong,et al.  Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation , 2014, ECCV.

[37]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Piyush Rai,et al.  A Simple Exponential Family Framework for Zero-Shot Learning , 2017, ECML/PKDD.

[40]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.