Heterogeneous Data Integration using Confidence Estimation of Unseen Visual Data for Zero-shot Learning

Zero-shot learning is a learning methodology that can be used to recognize concepts that have never been seen during the training phase. Recently, interest in zero-shot learning has been increased by embedding multi-modal data into common vector space through heterogeneous data integration methodology. However, since the existing methodologies compare heterogeneous data focusing on the similarity between each vector, the performance of zero-shot learning decreases when the number of semantic candidates increases. We propose a heterogeneous data integration methodology using a confidence estimator for unseen visual data which estimates that whether input data is unseen data or not and output confidence measure. The proposed methodology constructs a more efficient zero-shot learning model by applying estimated confidence of input unseen visual data to the visual-semantic distance obtained from heterogeneous data integration model. Experiments have shown that the proposed methodology can improve zero-shot learning performance for unseen data despite a small performance decrease in the seen data.

[1]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Qingming Huang,et al.  Learning Semantic Structure-preserved Embeddings for Cross-modal Retrieval , 2018, ACM Multimedia.

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Shaogang Gong,et al.  Unsupervised Domain Adaptation for Zero-Shot Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[8]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[10]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[11]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[12]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Joint Latent Similarity Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Wei-Lun Chao,et al.  Synthesized Classifiers for Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.