论文信息 - Zero-Shot Learning with Structured Embeddings

Zero-Shot Learning with Structured Embeddings

Despite significant recent advances in image classification, fine-grained classification remains a challenge. In the present paper, we address the zero-shot and few-shot learning scenarios as obtaining labeled data is especially difficult for fine-grained classification tasks. First, we embed state-of-the-art image descriptors in a label embedding space using side information such as attributes. We argue that learning a joint embedding space, that maximizes the compatibility between the input and output embeddings, is highly effective for zero/few-shot learning. We show empirically that such embeddings significantly outperforms the current state-of-the-art methods on two challenging datasets (Caltech-UCSD Birds and Animals with Attributes). Second, to reduce the amount of costly manual attribute annotations, we use alternate output embeddings based on the word-vector representations, obtained from large text-corpora without any supervision. We report that such unsupervised embeddings achieve encouraging results, and lead to further improvements when combined with the supervised ones.

Bernt Schiele | Honglak Lee | Zeynep Akata

[1] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.

[2] Huizhong Chen,et al. What's in a Name? First Names as Facial Attributes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Christoph H. Lampert,et al. Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] John Langford,et al. Multi-Label Prediction via Compressed Sensing , 2009, NIPS.

[5] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6] Bernt Schiele,et al. Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[7] Jason Weston,et al. Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[8] Silvio Savarese,et al. Recognizing human actions by attributes , 2011, CVPR 2011.

[9] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[10] Cordelia Schmid,et al. Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[12] Terrance E. Boult,et al. Multi-attribute spaces: Calibration for attribute fusion and similarity search , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Aram Kawewong,et al. Online incremental attribute-based zero-shot learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14] W. Marsden. I and J , 2012 .

[15] Pietro Perona,et al. Caltech-UCSD Birds 200 , 2010 .

[16] Geoffrey E. Hinton,et al. Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[17] Shih-Fu Chang,et al. Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19] Qiang Ji,et al. A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects , 2013, 2013 IEEE International Conference on Computer Vision.

[20] Andrea Vedaldi,et al. Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[21] Andrew Zisserman,et al. Learning Visual Attributes , 2007, NIPS.

[22] Thomas G. Dietterich,et al. Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[23] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.

[24] Kun Duan,et al. Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[26] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[27] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[28] Larry S. Davis,et al. Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.

[29] Leonidas J. Guibas,et al. Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[30] Cordelia Schmid,et al. Combining attributes and Fisher vectors for efficient image retrieval , 2011, CVPR 2011.

[31] Ali Farhadi,et al. Attribute-centric recognition for cross-category generalization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32] Kristen Grauman,et al. Relative attributes , 2011, 2011 International Conference on Computer Vision.

[33] Pietro Perona,et al. Multiclass recognition and part localization with humans in the loop , 2011, 2011 International Conference on Computer Vision.

[34] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.