Zero-Shot Classification Based on Word Vector Enhancement and Distance Metric Learning

The zero-shot classification algorithm has been widely concerned in recent years, in which the labeling of samples of a new category is unnecessary and the cost of annotations can be reduced in applications. This paper presents a zero-shot method for image classification based on word vectors enhancement and distance metric learning. Specifically, the convolutional neural network (CNN) is employed to extract image feature vectors which have the same dimension as semantic feature vectors. Then, an unsupervised learning method is applied on Wikipedia corpus for extracting word vectors and the skip-gram is used to obtain word vectors. The model of analysis dictionary learning is improved by reducing redundant information in word vectors. The obtained sparse vectors are used as semantic features and a distance metric learning method is employed to measure the distance between image features and semantic features. Finally, the classification is implemented by a nearest neighbor based classifier. The effectiveness of the proposed algorithm is validated on the AwA and CUB data sets. Experimental results demonstrate that the proposed method has good performance in terms of both accuracy and robustness.

[1]  Xudong Lin,et al.  Deep Adversarial Metric Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Weixin Luo,et al.  Discriminative analysis-synthesis dictionary learning for image classification , 2017, Neurocomputing.

[3]  Jiwen Lu,et al.  Sharable and Individual Multi-View Metric Learning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jian Huang,et al.  Towards zero-shot learning generalization via a cosine distance loss , 2020, Neurocomputing.

[5]  Sanja Fidler,et al.  Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Shih-Fu Chang,et al.  Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Zhongfei Zhang,et al.  Transductive Zero-Shot Learning With a Self-Training Dictionary Approach , 2017, IEEE Transactions on Cybernetics.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Kristen Grauman,et al.  Decorrelating Semantic Visual Attributes by Resisting the Urge to Share , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[12]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[13]  Yongming Li,et al.  Sequential multi-criteria feature selection algorithm based on agent genetic algorithm , 2008, Applied Intelligence.

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  Cees Snoek,et al.  Video2vec Embeddings Recognize Events When Examples Are Scarce , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Yoshua Bengio,et al.  Zero-data Learning of New Tasks , 2008, AAAI.

[17]  Shaogang Gong,et al.  Zero-shot object recognition by semantic manifold distance , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[19]  Xi Peng,et al.  A Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Yanqing Guo,et al.  Synthesis linear classifier based analysis dictionary learning for pattern classification , 2017, Neurocomputing.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Bernt Schiele,et al.  Feature Generating Networks for Zero-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[24]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[26]  Zi Huang,et al.  Leveraging the Invariant Side of Generative Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Kilian Q. Weinberger,et al.  Fast solvers and efficient implementations for distance metric learning , 2008, ICML '08.

[30]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[31]  Desmond Elliott,et al.  Multimodal Learning and Reasoning , 2016, ACL 2016.

[32]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Zhijie Wen,et al.  Manifold Preserving: An Intrinsic Approach for Semisupervised Distance Metric Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Bernt Schiele,et al.  F-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Kristen Grauman,et al.  Zero-shot recognition with unreliable attributes , 2014, NIPS.

[36]  Baoxin Li,et al.  Discriminative K-SVD for dictionary learning in face recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Lei Wang,et al.  Scalable Large-Margin Mahalanobis Distance Metric Learning , 2010, IEEE Transactions on Neural Networks.

[38]  Irving Biederman,et al.  "Recognition-by-components: A theory of human image understanding": Clarification. , 1989 .