Zero-shot recognition with latent visual attributes learning

Zero-shot learning (ZSL) aims to recognize novel object categories by means of transferring knowledge extracted from the seen categories (source domain) to the unseen categories (target domain). Recently, most ZSL methods concentrate on learning a visual-semantic alignment to bridge image features and their semantic representations by relying solely on the human-designed attributes. However, few works study whether the human-designed attributes are discriminative enough for recognition task. To address this problem, we propose a couple semantic dictionaries (CSD) learning approach to exploit the latent visual attributes and align the visual-semantic spaces at the same time. Specifically, the learned visual attributes are elegantly incorporated into the semantic representation of image feature and then consolidate the discriminative visual cues for object recognition. In addition, existing ZSL methods suffer from the domain shift issue due to the source domain and target domain have completely separated label spaces. We further employ the visual-semantic alignment and latent visual attributes jointly from source domain to regularise the learning of target domain, which ensures the expansibility of information transfer across domains. We formulate this as an optimization problem on a unified objective and propose an iterative solver. Extensive experiments on two challenging benchmark datasets demonstrate that our proposed approach outperforms several state-of-the-art ZSL methods.

[1]  Marc'Aurelio Ranzato,et al.  Task-Driven Modular Networks for Zero-Shot Compositional Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Bingbing Ni,et al.  Zero-Shot Action Recognition with Error-Correcting Output Codes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ming Shao,et al.  Generative Zero-Shot Learning via Low-Rank Embedded Semantic Dictionary , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jun Yu,et al.  Semantic preserving distance metric learning and applications , 2014, Inf. Sci..

[6]  Wei-Lun Chao,et al.  Synthesized Classifiers for Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Weifeng Liu,et al.  Multiview dimension reduction via Hessian multiset canonical correlations , 2018, Inf. Fusion.

[8]  Yang Yang,et al.  Semantic binary coding for visual recognition via joint concept-attribute modelling , 2018, Multimedia Tools and Applications.

[9]  Ling Shao,et al.  Zero-shot leaning and hashing with binary visual similes , 2018, Multimedia Tools and Applications.

[10]  Cordelia Schmid,et al.  Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[12]  Shaogang Gong,et al.  Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation , 2014, ECCV.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[15]  Soma Biswas,et al.  Preserving Semantic Relations for Zero-Shot Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Wei Liu,et al.  A Survey on Canonical Correlation Analysis , 2019, IEEE Transactions on Knowledge and Data Engineering.

[17]  Cees Snoek,et al.  Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Michal Irani,et al.  "Zero-Shot" Super-Resolution Using Deep Internal Learning , 2017, CVPR.

[19]  Yi Yang,et al.  Exploring Semantic Inter-Class Relationships (SIR) for Zero-Shot Action Recognition , 2015, AAAI.

[20]  Babak Saleh,et al.  Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[22]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[23]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[25]  Jian Yang,et al.  Coupled-learning convolutional neural networks for object recognition , 2017, Multimedia Tools and Applications.

[26]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[27]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Piotr Szczuko,et al.  Deep neural networks for human pose estimation from a very low resolution depth image , 2019, Multimedia Tools and Applications.

[29]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Xiaobo Jin,et al.  Attentive Region Embedding Network for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[34]  Piyush Rai,et al.  A Simple Exponential Family Framework for Zero-Shot Learning , 2017, ECML/PKDD.

[35]  Rama Chellappa,et al.  Zero-Shot Object Detection , 2018, ECCV.

[36]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[37]  Sheng Yu,et al.  Stratified pooling based deep convolutional neural networks for human action recognition , 2017, Multimedia Tools and Applications.

[38]  Anurag Mittal,et al.  A Zero-Shot Framework for Sketch-based Image Retrieval , 2018, ECCV.

[39]  Ling Shao,et al.  Zero-Shot Sketch-Image Hashing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Shaogang Gong,et al.  Semantic Autoencoder for Zero-Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[42]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[43]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[44]  Bin Tong,et al.  Hierarchical Disentanglement of Discriminative Latent Features for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Shiguang Shan,et al.  Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition , 2018, ECCV.

[46]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Jianmin Jiang,et al.  Image interpolation using convolutional neural networks with deep recursive residual learning , 2019, Multimedia Tools and Applications.

[49]  Haofeng Zhang,et al.  Asymmetric graph based zero shot learning , 2020, Multimedia Tools and Applications.