Learning Joint Feature Adaptation for Zero-Shot Recognition

Zero-shot recognition (ZSR) aims to recognize target-domain data instances of unseen classes based on the models learned from associated pairs of seen-class source and target domain data. One of the key challenges in ZSR is the relative scarcity of source-domain features (e.g. one feature vector per class), which do not fully account for wide variability in target-domain instances. In this paper we propose a novel framework of learning data-dependent feature transforms for scoring similarity between an arbitrary pair of source and target data instances to account for the wide variability in target domain. Our proposed approach is based on optimizing over a parameterized family of local feature displacements that maximize the source-target adaptive similarity functions. Accordingly we propose formulating zero-shot learning (ZSL) using latent structural SVMs to learn our similarity functions from training data. As demonstration we design a specific algorithm under the proposed framework involving bilinear similarity functions and regularized least squares as penalties for feature displacement. We test our approach on several benchmark datasets for ZSR and show significant improvement over the state-of-the-art. For instance, on aP&Y dataset we can achieve 80.89% in terms of recognition accuracy, outperforming the state-of-the-art by 11.15%.

[1]  Wei-Lun Chao,et al.  An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild , 2016, ECCV.

[2]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[3]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Joint Latent Similarity Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  C. Lawrence Zitnick,et al.  Zero-Shot Learning via Visual Abstraction , 2014, ECCV.

[5]  Mubarak Shah,et al.  Fast Zero-Shot Image Tagging , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Sanja Fidler,et al.  Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Wei-Lun Chao,et al.  Synthesized Classifiers for Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[9]  Qiang Ji,et al.  A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Yi Yang,et al.  Exploring Semantic Inter-Class Relationships (SIR) for Zero-Shot Action Recognition , 2015, AAAI.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Dale Schuurmans,et al.  Semi-Supervised Zero-Shot Classification with Label Representation Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[14]  Xin Li,et al.  Max-Margin Zero-Shot Learning for Multi-class Classification , 2015, AISTATS.

[15]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Vinod Nair,et al.  A joint learning framework for attribute models and object descriptions , 2011, 2011 International Conference on Computer Vision.

[17]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[18]  Anton van den Hengel,et al.  Less is More: Zero-Shot Learning from Online Textual Documents with Noise Suppression , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Rainer Stiefelhagen,et al.  Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[21]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[22]  Shaogang Gong,et al.  Zero-shot object recognition by semantic manifold distance , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[24]  Yi Yang,et al.  Semantic Concept Discovery for Large-Scale Zero-Shot Event Detection , 2015, IJCAI.

[25]  Gabriela Csurka,et al.  Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost , 2012, ECCV.

[26]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[27]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[29]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Cordelia Schmid,et al.  Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[32]  Shuang Wu,et al.  Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Jianmin Wang,et al.  Transductive Zero-Shot Recognition via Shared Model Space Learning , 2016, AAAI.

[34]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[36]  Chen Xu,et al.  The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding , 2014, International Journal of Computer Vision.

[37]  Bernt Schiele,et al.  Transfer Learning in a Transductive Setting , 2013, NIPS.

[38]  Venkatesh Saligrama,et al.  Zero-Shot Recognition via Structured Prediction , 2016, ECCV.

[39]  Kristen Grauman,et al.  Zero-shot recognition with unreliable attributes , 2014, NIPS.

[40]  Shaogang Gong,et al.  Unsupervised Domain Adaptation for Zero-Shot Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  Bernt Schiele,et al.  Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[42]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[44]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Kristen Grauman,et al.  Interactively building a discriminative vocabulary of nameable attributes , 2011, CVPR 2011.

[46]  Xiaodong Yu,et al.  Attribute-Based Transfer Learning for Object Categorization with Zero/One Training Example , 2010, ECCV.

[47]  Wei Ping,et al.  Marginal Structured SVM with Hidden Variables , 2014, ICML.

[48]  Shaogang Gong,et al.  Transductive Multi-View Zero-Shot Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Ahmed M. Elgammal,et al.  Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos , 2015, AAAI.

[50]  Yueting Zhuang,et al.  Relational Knowledge Transfer for Zero-Shot Learning , 2016, AAAI.

[51]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Shaogang Gong,et al.  Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation , 2014, ECCV.

[53]  Frédéric Jurie,et al.  Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classiffication , 2016, ECCV.

[54]  S. V. N. Vishwanathan,et al.  Efficient max-margin multi-label classification with applications to zero-shot learning , 2012, Machine Learning.

[55]  Cees Snoek,et al.  COSTA: Co-Occurrence Statistics for Zero-Shot Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Shih-Fu Chang,et al.  Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Alexander C. Berg,et al.  Automatic Attribute Discovery and Characterization from Noisy Web Data , 2010, ECCV.