Semantic Graph for Zero-Shot Learning

Zero-shot learning aims to classify visual objects without any training data via knowledge transfer between seen and unseen classes. This is typically achieved by exploring a semantic embedding space where the seen and unseen classes can be related. Previous works differ in what embedding space is used and how different classes and a test image can be related. In this paper, we utilize the annotation-free semantic word space for the former and focus on solving the latter issue of modeling relatedness. Specifically, in contrast to previous work which ignores the semantic relationships between seen classes and focus merely on those between seen and unseen classes, in this paper a novel approach based on a semantic graph is proposed to represent the relationships between all the seen and unseen class in a semantic word space. Based on this semantic graph, we design a special absorbing Markov chain process, in which each unseen class is viewed as an absorbing state. After incorporating one test image into the semantic graph, the absorbing probabilities from the test data to each unseen class can be effectively computed; and zero-shot classification can be achieved by finding the class label with the highest absorbing probability. The proposed model has a closed-form solution which is linear with respect to the number of test images. We demonstrate the effectiveness and computational efficiency of the proposed method over the state-of-the-arts on the AwA (animals with attributes) dataset.

[1]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[2]  Bernt Schiele,et al.  What helps where – and why? Semantic relatedness for knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Bernt Schiele,et al.  Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[4]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[5]  Andrew Zisserman,et al.  Learning Visual Attributes , 2007, NIPS.

[6]  Cordelia Schmid,et al.  Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[8]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[9]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Ali Farhadi,et al.  Attribute-centric recognition for cross-category generalization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17]  Terrance E. Boult,et al.  Multi-attribute spaces: Calibration for attribute fusion and similarity search , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.