论文信息 - Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classiffication

Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classiffication

This paper addresses the task of zero-shot image classification. The key contribution of the proposed approach is to control the semantic embedding of images -- one of the main ingredients of zero-shot learning -- by formulating it as a metric learning problem. The optimized empirical criterion associates two types of sub-task constraints: metric discriminating capacity and accurate attribute prediction. This results in a novel expression of zero-shot learning not requiring the notion of class in the training phase: only pairs of image/attributes, augmented with a consistency indicator, are given as ground truth. At test time, the learned model can predict the consistency of a test image with a given set of attributes , allowing flexible ways to produce recognition inferences. Despite its simplicity, the proposed approach gives state-of-the-art results on four challenging datasets used for zero-shot recognition evaluation.

[1] Yoram Singer,et al. Online and batch learning of pseudo-metrics , 2004, ICML.

[2] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[3] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Geoffrey E. Hinton,et al. Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[5] Gang Wang,et al. Joint learning of visual attributes, object classes and visual saliency , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Alexander C. Berg,et al. Automatic Attribute Discovery and Characterization from Noisy Web Data , 2010, ECCV.

[8] Yang Wang,et al. A Discriminative Latent Model of Object Classes and Attributes , 2010, ECCV.

[9] Kristen Grauman,et al. Relative attributes , 2011, 2011 International Conference on Computer Vision.

[10] Jason Weston,et al. WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[11] Vinod Nair,et al. A joint learning framework for attribute models and object descriptions , 2011, 2011 International Conference on Computer Vision.

[12] Kristen Grauman,et al. Interactively building a discriminative vocabulary of nameable attributes , 2011, CVPR 2011.

[13] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[14] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[15] Bernt Schiele,et al. Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[16] Kun Duan,et al. Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Gabriela Csurka,et al. Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost , 2012, ECCV.

[18] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19] Vinod Nair,et al. Learning hierarchical similarity metrics , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Shih-Fu Chang,et al. Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[22] Chen Xu,et al. The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding , 2014, International Journal of Computer Vision.

[23] Marc Sebban,et al. A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[24] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[25] Thomas Mensink,et al. Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[26] Babak Saleh,et al. Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.

[27] Kristen Grauman,et al. Zero-shot recognition with unreliable attributes , 2014, NIPS.

[28] Shuang Wu,et al. Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Takayuki Okatani,et al. Understanding Convolutional Neural Networks in Terms of Category-Level Attributes , 2014, ACCV.

[30] Christoph H. Lampert,et al. Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] Samy Bengio,et al. Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[32] Shaogang Gong,et al. Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation , 2014, ECCV.

[33] Cees Snoek,et al. COSTA: Co-Occurrence Statistics for Zero-Shot Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Philip H. S. Torr,et al. An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[35] Shaogang Gong,et al. Unsupervised Domain Adaptation for Zero-Shot Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36] Sanja Fidler,et al. Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37] Shaogang Gong,et al. Zero-shot object recognition by semantic manifold distance , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Dale Schuurmans,et al. Semi-Supervised Zero-Shot Classification with Label Representation Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39] Venkatesh Saligrama,et al. Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40] Bernard Ghanem,et al. On the relationship between visual attributes and convolutional networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Bernt Schiele,et al. Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Mikhail Belkin,et al. Probabilistic Zero-shot Classification with Semantic Rankings , 2015, ArXiv.

[43] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[44] Cordelia Schmid,et al. Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45] Bodo Rosenhahn,et al. Exploiting View-Specific Appearance Similarities Across Classes for Zero-Shot Pose Prediction: A Metric Learning Approach , 2016, AAAI.

[46] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[47] Venkatesh Saligrama,et al. Zero-Shot Learning via Joint Latent Similarity Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Tapani Raiko,et al. International Conference on Learning Representations (ICLR) , 2016 .