Generic compact representation through visual-semantic ambiguity removal

Abstract Zero-Shot Hashing (ZSH) aims to learn compact binary codes that can preserve semantic contents of the images from unseen categories. Conventional approaches project visual features to a semantic space that is shared by both seen and unseen categories. However, we observe that such a one-way paradigm suffers from the visual-semantic ambiguity problem. Namely, the semantic concepts (e.g. attributes) cannot explicitly correspond to visual patterns, and vice versa. Such a problem can lead to a huge variance in the visual features for each attribute. In this paper, we investigate how to remove such semantic ambiguity based on the observed visual appearances. In particular, we propose (1) a novel latent attribute space to mitigate the gap between visual appearances and semantic expressions; (2) a dual-graph regularised embedding algorithm called V isual- S emantic A mbiguity R emoval (VSAR) that can simultaneously extract the shared components between visual and semantic information and mutually align the data distribution based on the intrinsic local structures of both spaces; (3) a new zero-shot hashing framework that can deal with both instance-level and category-level tasks. We validate our method on four popular benchmarks. Extensive experiments demonstrate that our proposed approach significantly performs the state-of-the-art methods.

[1]  Aram Kawewong,et al.  Online incremental attribute-based zero-shot learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Ling Shao,et al.  Projection Bank: From High-Dimensional Data to Medium-Length Binary Codes , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Ling Shao,et al.  Kernelized Multiview Projection for Robust Action Recognition , 2016, International Journal of Computer Vision.

[5]  Ling Shao,et al.  Describing Unseen Classes by Exemplars: Zero-Shot Learning Using Grouped Simile Ensemble , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[6]  Ling Shao,et al.  Unsupervised Deep Hashing With Pseudo Labels for Scalable Image Retrieval , 2018, IEEE Transactions on Image Processing.

[7]  Ling Shao,et al.  Sequential Compact Code Learning for Unsupervised Image Hashing , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Ling Shao,et al.  Zero-Shot Learning Using Synthesised Unseen Visual Data with Diffusion Regularisation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[10]  Cees Snoek,et al.  COSTA: Co-Occurrence Statistics for Zero-Shot Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Dale Schuurmans,et al.  Semi-Supervised Zero-Shot Classification with Label Representation Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Ling Shao,et al.  Structure-Preserving Binary Representations for RGB-D Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  David A. Forsyth,et al.  Describing objects by their attributes , 2009, CVPR.

[14]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[15]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[16]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Cordelia Schmid,et al.  Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Shaogang Gong,et al.  Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation , 2014, ECCV.

[19]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Ling Shao,et al.  Learning to Recognise Unseen Classes by A Few Similes , 2017, ACM Multimedia.

[21]  Jianmin Wang,et al.  Transductive Zero-Shot Recognition via Shared Model Space Learning , 2016, AAAI.

[22]  Ling Shao,et al.  Recognising occluded multi-view actions using local nearest neighbour embedding , 2016, Comput. Vis. Image Underst..

[23]  Kristen Grauman,et al.  Zero-shot recognition with unreliable attributes , 2014, NIPS.

[24]  Shaogang Gong,et al.  Zero-shot object recognition by semantic manifold distance , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[26]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[27]  Shih-Fu Chang,et al.  Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Ziad Al-Halah,et al.  Learning semantic attributes via a common latent space , 2015, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[29]  Shaogang Gong,et al.  Unsupervised Domain Adaptation for Zero-Shot Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Andrew Zisserman,et al.  Learning Visual Attributes , 2007, NIPS.

[31]  Kristen Grauman,et al.  Decorrelating Semantic Visual Attributes by Resisting the Urge to Share , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Ling Shao,et al.  Latent Structure Preserving Hashing , 2017, International Journal of Computer Vision.

[33]  Shubham Pachori,et al.  Zero Shot Hashing , 2016, ArXiv.

[34]  Vinod Nair,et al.  A joint learning framework for attribute models and object descriptions , 2011, 2011 International Conference on Computer Vision.

[35]  Tao Xiang,et al.  Learning Multimodal Latent Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Ling Shao,et al.  Towards Affordable Semantic Searching: Zero-Shot Retrieval via Dominant Attributes , 2018, AAAI.

[37]  Bernt Schiele,et al.  Zero-Shot Learning — The Good, the Bad and the Ugly , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).