A Joint Label Space for Generalized Zero-Shot Classification

The fundamental problem of Zero-Shot Learning (ZSL) is that the one-hot label space is discrete, which leads to a complete loss of the relationships between seen and unseen classes. Conventional approaches rely on using semantic auxiliary information, e.g. attributes, to re-encode each class so as to preserve the inter-class associations. However, existing learning algorithms only focus on unifying visual and semantic spaces without jointly considering the label space. More importantly, because the final classification is conducted in the label space through a compatibility function, the gap between attribute and label spaces leads to significant performance degradation. Therefore, this paper proposes a novel pathway that uses the label space to jointly reconcile visual and semantic spaces directly, which is named Attributing Label Space (ALS). In the training phase, one-hot labels of seen classes are directly used as prototypes in a common space, where both images and attributes are mapped. Since mappings can be optimized independently, the computational complexity is extremely low. In addition, the correlation between semantic attributes has less influence on visual embedding training because features are mapped into labels instead of attributes. In the testing phase, the discrete condition of label space is removed, and priori one-hot labels are used to denote seen classes and further compose labels of unseen classes. Therefore, the label space is very discriminative for the Generalized ZSL (GZSL), which is more reasonable and challenging for real-world applications. Extensive experiments on five benchmarks manifest improved performance over all of compared state-of-the-art methods.

[1]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Joint Latent Similarity Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Kristen Grauman,et al.  Decorrelating Semantic Visual Attributes by Resisting the Urge to Share , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Cordelia Schmid,et al.  Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[6]  Aram Kawewong,et al.  Online incremental attribute-based zero-shot learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Tongxing Lu,et al.  Solution of the matrix equation AX−XB=C , 2005, Computing.

[8]  Tao Xiang,et al.  Learning a Deep Embedding Model for Zero-Shot Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yuji Matsumoto,et al.  Ridge Regression, Hubness, and Zero-Shot Learning , 2015, ECML/PKDD.

[10]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ling Shao,et al.  Zero Shot Learning via Low-rank Embedded Semantic AutoEncoder , 2018, IJCAI.

[12]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Chen Xu,et al.  The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding , 2014, International Journal of Computer Vision.

[14]  Alan L. Yuille,et al.  Multiple Instance Visual-Semantic Embedding , 2017 .

[15]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[16]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Richard H. Bartels,et al.  Algorithm 432 [C2]: Solution of the matrix equation AX + XB = C [F4] , 1972, Commun. ACM.

[18]  Shaogang Gong,et al.  Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation , 2014, ECCV.

[19]  Abhinav Gupta,et al.  Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Bernt Schiele,et al.  Learning Deep Representations of Fine-Grained Visual Descriptions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Soma Biswas,et al.  Preserving Semantic Relations for Zero-Shot Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Wei Liu,et al.  Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Shiguang Shan,et al.  Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition , 2018, ECCV.

[26]  Dragomir Anguelov,et al.  Capturing Long-Tail Distributions of Object Subcategories , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Wei-Lun Chao,et al.  An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild , 2016, ECCV.

[29]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[30]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[31]  Hongguang Zhang,et al.  Zero-Shot Kernel Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[33]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[34]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[35]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[37]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[38]  Ming Shao,et al.  Low-Rank Embedded Ensemble Semantic Dictionary for Zero-Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Ling Shao,et al.  From Zero-Shot Learning to Conventional Supervised Classification: Unseen Visual Data Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Sanja Fidler,et al.  Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  Wei-Lun Chao,et al.  Synthesized Classifiers for Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[43]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[44]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[45]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[46]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[47]  Yuri Owechko,et al.  Zero Shot Learning via Multi-scale Manifold Regularization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Yang Yang,et al.  Matrix Tri-Factorization with Manifold Regularizations for Zero-Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Shaogang Gong,et al.  Unsupervised Domain Adaptation for Zero-Shot Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Shaogang Gong,et al.  Semantic Autoencoder for Zero-Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.