Boosting Zero-Shot Image Classification via Pairwise Relationship Learning

Zero-shot image classification (ZSIC) is one of the emerging challenges in the communities of computer vision, artificial intelligence and machine learning. In this paper, we propose to exploit the pairwise relationships between test instances to increase the performance of conventional methods, e.g. direct attribute prediction (DAP), for the ZSIC problem. To infer pairwise relationships between test instances, we introduce two different methods, a binary classification based method and a metric learning based method. Based on the inferred relationships, we construct a similarity graph to represent test instances, and then employ an adaptive graph anchors voting method to refine the results of DAP iteratively: In each iteration, we partition the similarity graph with the normalized spectral clustering method, and determine the class label of each cluster via the voting of graph anchors. Extensive experiments validate the effectiveness of our method: with the properly learned pairwise relationships, we successfully boost the mean class accuracy of DAP on two standard benchmarks for the ZSIC problem, Animal with Attribute and aPascal-aYahoo, from \(57.46\%\) to \(84.43\%\) and \(26.59\%\) to \(70.09\%\), respectively. Besides, experimental results on the SUN Attribute also suggest our method can obtain considerable performance improvement for the large-scale ZSIC problem.

[1]  Yang Yu,et al.  Learning with Augmented Class by Exploiting Unlabeled Data , 2014, AAAI.

[2]  Cees Snoek,et al.  COSTA: Co-Occurrence Statistics for Zero-Shot Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jianmin Wang,et al.  Transductive Zero-Shot Recognition via Shared Model Space Learning , 2016, AAAI.

[4]  Kristen Grauman,et al.  Zero-shot recognition with unreliable attributes , 2014, NIPS.

[5]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[6]  Dale Schuurmans,et al.  Semi-Supervised Zero-Shot Classification with Label Representation Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  L. Davis,et al.  Joint Image Clustering and Labeling by Matrix Factorization , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Trevor Darrell,et al.  YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[11]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Shuang Wu,et al.  Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Shaogang Gong,et al.  Zero-shot object recognition by semantic manifold distance , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Mahdieh Soleymani Baghshah,et al.  Semi-Supervised Metric Learning Using Pairwise Constraints , 2009, IJCAI.

[16]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[17]  Wei Liu,et al.  Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[18]  Chen Xu,et al.  The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding , 2014, International Journal of Computer Vision.

[19]  Yi Yang,et al.  Semantic Concept Discovery for Large-Scale Zero-Shot Event Detection , 2015, IJCAI.

[20]  Babak Saleh,et al.  Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Zhenguo Li,et al.  Pairwise constraint propagation by semidefinite programming for semi-supervised classification , 2008, ICML '08.

[22]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Joint Latent Similarity Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[24]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[25]  Martin L. Griss,et al.  Towards zero-shot learning for human activity recognition using semantic attribute sequence model , 2013, UbiComp.

[26]  Yi Yang,et al.  Concepts Not Alone: Exploring Pairwise Relationships for Zero-Shot Video Activity Recognition , 2016, AAAI.

[27]  Jiwen Lu,et al.  Discriminative Deep Metric Learning for Face Verification in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Shaogang Gong,et al.  Transductive Multi-View Zero-Shot Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Yueting Zhuang,et al.  Relational Knowledge Transfer for Zero-Shot Learning , 2016, AAAI.

[30]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Hanhui Li,et al.  BAP: Bimodal Attribute Prediction for Zero-Shot Image Categorization , 2014, ACM Multimedia.

[32]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[33]  Shuicheng Yan,et al.  Image tag refinement towards low-rank, content-tag prior and error sparsity , 2010, ACM Multimedia.

[34]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Marco Maggini,et al.  Learning from pairwise constraints by Similarity Neural Networks , 2012, Neural Networks.

[36]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[37]  Bernard Ghanem,et al.  On the relationship between visual attributes and convolutional networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Cordelia Schmid,et al.  Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.