Dual-verification network for zero-shot learning

To mitigate the problems of visual ambiguity and domain shift in conventional zero-shot learning (ZSL), in this paper, we propose a novel method, namely, dual-verification network (DVN), which accepts features and attributes in a pairwise manner as input and verifies the result in both the attribute and feature spaces. First, the DVN projects a feature onto an orthogonal space, where the projected feature has maximum correlation with its corresponding attribute and is orthogonal to all the other attributes. Second, we adopt the concept of semantic feature representation, which computes the relationship between the semantic feature and class labels. Based on this concept, we project the attributes onto the feature space by extending the attributes and labels from the class level to instance level. In addition, we employ a deep architecture and utilize the cross entropy loss to train an end-to-end network for dual verification. Extensive experiments in ZSL and generalized ZSL are performed on four well-known datasets, and the results show tha43-57t the proposed DVN exhibits a competitive performance relative to the state-of-the-art methods.

[1]  Dale Schuurmans,et al.  Semi-Supervised Zero-Shot Classification with Label Representation Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[3]  Roneel V. Sharan,et al.  Robust acoustic event classification using deep neural networks , 2017, Inf. Sci..

[4]  Cordelia Schmid,et al.  Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yue Gao,et al.  Synthesizing Samples for Zero-shot Learning , 2017, IJCAI.

[6]  Ling Shao,et al.  Attribute Embedding with Visual-Semantic Ambiguity Removal for Zero-shot Learning , 2016, BMVC.

[7]  Bernt Schiele,et al.  Learning Deep Representations of Fine-Grained Visual Descriptions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ling Shao,et al.  Performance evaluation of deep feature learning for RGB-D image/video classification , 2017, Inf. Sci..

[9]  Changshui Zhang,et al.  Attribute-Based Synthetic Network (ABS-Net): Learning more from pseudo feature representations , 2018, Pattern Recognit..

[10]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Wei-Lun Chao,et al.  Synthesized Classifiers for Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Chen Xu,et al.  The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding , 2014, International Journal of Computer Vision.

[13]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[14]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Zhongfei Zhang,et al.  Manifold regularized cross-modal embedding for zero-shot learning , 2017, Inf. Sci..

[16]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[17]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Ling Shao,et al.  From Zero-Shot Learning to Conventional Supervised Classification: Unseen Visual Data Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Rainer Stiefelhagen,et al.  Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ahmed M. Elgammal,et al.  Learning Hypergraph-regularized Attribute Predictors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yanan Li,et al.  Zero-Shot Recognition Using Dual Visual-Semantic Mapping Paths , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yi Pan,et al.  Convolutional networks with cross-layer neurons for image recognition , 2018, Inf. Sci..

[23]  Zhou Yu,et al.  Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Aram Kawewong,et al.  Online incremental attribute-based zero-shot learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Yueting Zhuang,et al.  Relational Knowledge Transfer for Zero-Shot Learning , 2016, AAAI.

[26]  Jianmin Wang,et al.  Transductive Zero-Shot Recognition via Shared Model Space Learning , 2016, AAAI.

[27]  Tao Xiang,et al.  Learning a Deep Embedding Model for Zero-Shot Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Qiang Yang,et al.  Heterogeneous Transfer Learning for Image Classification , 2011, AAAI.

[29]  Andrew Zisserman,et al.  Learning Visual Attributes , 2007, NIPS.

[30]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[31]  Shaogang Gong,et al.  Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation , 2014, ECCV.

[32]  Nazli Ikizler-Cinbis,et al.  Attributes2Classname: A Discriminative Model for Attribute-Based Unsupervised Zero-Shot Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[34]  Yang Wang,et al.  Unsupervised local deep feature for image recognition , 2016, Inf. Sci..

[35]  Bernt Schiele,et al.  Zero-Shot Learning — The Good, the Bad and the Ugly , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Ling Shao,et al.  Unsupervised Deep Hashing With Pseudo Labels for Scalable Image Retrieval , 2018, IEEE Transactions on Image Processing.

[37]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Meng Wang,et al.  Multimodal Deep Autoencoder for Human Pose Recovery , 2015, IEEE Transactions on Image Processing.

[39]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Shaogang Gong,et al.  Unsupervised Domain Adaptation for Zero-Shot Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  Shaogang Gong,et al.  Semantic Autoencoder for Zero-Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[43]  Nuno Vasconcelos,et al.  Semantically Consistent Regularization for Zero-Shot Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[45]  Zhou Yu,et al.  Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[46]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Ling Shao,et al.  Zero-shot Hashing with orthogonal projection for image retrieval , 2019, Pattern Recognit. Lett..

[48]  Jia Deng,et al.  A large-scale hierarchical image database , 2009, CVPR 2009.

[49]  Ling Shao,et al.  Describing Unseen Classes by Exemplars: Zero-Shot Learning Using Grouped Simile Ensemble , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).