Modality independent adversarial network for generalized zero shot image classification

Zero Shot Learning (ZSL) aims to classify images of unseen target classes by transferring knowledge from source classes through semantic embeddings. The core of ZSL research is to embed both visual representation of object instance and semantic description of object class into a joint latent space and learn cross-modal (visual and semantic) latent representations. However, the learned representations by existing efforts often fail to fully capture the underlying cross-modal semantic consistency, and some of the representations are very similar and less discriminative. To circumvent these issues, in this paper, we propose a novel deep framework, called Modality Independent Adversarial Network (MIANet) for Generalized Zero Shot Learning (GZSL), which is an end-to-end deep architecture with three submodules. First, both visual feature and semantic description are embedded into a latent hyper-spherical space, where two orthogonal constraints are employed to ensure the learned latent representations discriminative. Second, a modality adversarial submodule is employed to make the latent representations independent of modalities to make the shared representations grab more cross-modal high-level semantic information during training. Third, a cross reconstruction submodule is proposed to reconstruct latent representations into the counterparts instead of themselves to make them capture more modality irrelevant information. Comprehensive experiments on five widely used benchmark datasets are conducted on both GZSL and standard ZSL settings, and the results show the effectiveness of our proposed method.

[1]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[2]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[3]  Yang Yang,et al.  Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.

[4]  Andrew Zisserman,et al.  Learning Visual Attributes , 2007, NIPS.

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Shiguang Shan,et al.  Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition , 2018, ECCV.

[8]  Kai Fan,et al.  Zero-Shot Learning via Class-Conditioned Deep Generative Models , 2017, AAAI.

[9]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[10]  Heng Tao Shen,et al.  Pseudo Transfer with Marginalized Corrupted Attribute for Zero-shot Learning , 2018, ACM Multimedia.

[11]  Ling Shao,et al.  Zero Shot Learning via Low-rank Embedded Semantic AutoEncoder , 2018, IJCAI.

[12]  Yang Liu,et al.  Nonlinear second-order multi-agent systems subject to antagonistic interactions without velocity constraints , 2020, Appl. Math. Comput..

[13]  Jian Ni,et al.  Dual Adversarial Semantics-Consistent Network for Generalized Zero-Shot Learning , 2019, NeurIPS.

[14]  Ruifan Li,et al.  Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.

[15]  Ming Shao,et al.  Generative Zero-Shot Learning via Low-Rank Embedded Semantic Dictionary , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Yijing Wang,et al.  Double-Integrator Dynamics for Multiagent Systems With Antagonistic Reciprocity , 2020, IEEE Transactions on Cybernetics.

[17]  Michael I. Jordan,et al.  Generalized Zero-Shot Learning with Deep Calibration Network , 2018, NeurIPS.

[18]  Ishwar K. Sethi,et al.  Multimedia content processing through cross-modal association , 2003, MULTIMEDIA '03.

[19]  Shaogang Gong,et al.  Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation , 2014, ECCV.

[20]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[21]  Yuxin Peng,et al.  Cross-Media Shared Representation by Hierarchical Learning with Multiple Deep Networks , 2016, IJCAI.

[22]  Ling Shao,et al.  Dual-verification network for zero-shot learning , 2019, Inf. Sci..

[23]  Wei-Lun Chao,et al.  An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild , 2016, ECCV.

[24]  Ling Shao,et al.  Adversarial unseen visual feature synthesis for Zero-shot Learning , 2019, Neurocomputing.

[25]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[26]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Cordelia Schmid,et al.  Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[29]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[30]  Chen Xu,et al.  The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding , 2014, International Journal of Computer Vision.

[31]  Frédéric Jurie,et al.  Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classiffication , 2016, ECCV.

[32]  Piyush Rai,et al.  A Simple Exponential Family Framework for Zero-Shot Learning , 2017, ECML/PKDD.

[33]  Ling Shao,et al.  Triple Verification Network for Generalized Zero-Shot Learning , 2019, IEEE Transactions on Image Processing.

[34]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[35]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[36]  Gal Chechik,et al.  Probabilistic AND-OR Attribute Grouping for Zero-Shot Learning , 2018, UAI.