Rethinking Generative Zero-Shot Learning: An Ensemble Learning Perspective for Recognising Visual Patches

Zero-shot learning (ZSL) is commonly used to address the very pervasive problem of predicting unseen classes in fine-grained image classification and other tasks. One family of solutions is to learn synthesised unseen visual samples produced by generative models from auxiliary semantic information, such as natural language descriptions. However, for most of these models, performance suffers from noise in the form of irrelevant image backgrounds. Further, most methods do not allocate a calculated weight to each semantic patch. Yet, in the real world, the discriminative power of features can be quantified and directly leveraged to improve accuracy and reduce computational complexity. To address these issues, we propose a novel framework called multi-patch generative adversarial nets (MPGAN) that synthesises local patch features and labels unseen classes with a novel weighted voting strategy. The process begins by generating discriminative visual features from noisy text descriptions for a set of predefined local patches using multiple specialist generative models. The features synthesised from each patch for unseen classes are then used to construct an ensemble of diverse supervised classifiers, each corresponding to one local patch. A voting strategy averages the probability distributions output from the classifiers and, given that some patches are more discriminative than others, a discrimination-based attention mechanism helps to weight each patch accordingly. Extensive experiments show that MPGAN has significantly greater accuracy than state-of-the-art methods.

[1]  Zi Huang,et al.  I read, I saw, I tell: Texts Assisted Fine-Grained Visual Classification , 2018, ACM Multimedia.

[2]  Fisher Yu,et al.  TextureGAN: Controlling Deep Image Synthesis with Texture Patches , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Ling Shao,et al.  SRSC: Selective, Robust, and Supervised Constrained Feature Representation for Image Classification , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Trevor Darrell,et al.  Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[6]  Yang Yang,et al.  Zero-Shot Hashing via Transferring Supervised Knowledge , 2016, ACM Multimedia.

[7]  Yadan Luo,et al.  Cycle-Consistent Diverse Image Synthesis from Natural Language , 2019, 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[8]  Zhongfei Zhang,et al.  Stacked Semantic-Guided Attention Model for Fine-Grained Zero-Shot Learning , 2018, ArXiv.

[9]  Bernt Schiele,et al.  Multi-cue Zero-Shot Learning with Strong Supervision , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Pietro Perona,et al.  Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Anton van den Hengel,et al.  Less is More: Zero-Shot Learning from Online Textual Documents with Noise Suppression , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Deng Cai,et al.  Attribute Attention for Semantic Disambiguation in Zero-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Bernt Schiele,et al.  F-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Xiaobo Jin,et al.  Attentive Region Embedding Network for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Zhiqiang Tang,et al.  Semantic-Guided Multi-Attention Localization for Zero-Shot Learning , 2019, NeurIPS.

[16]  Philip S. Yu,et al.  Generative Dual Adversarial Network for Generalized Zero-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Wei-Lun Chao,et al.  Synthesized Classifiers for Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Mohamed Elhoseiny,et al.  Creativity Inspired Zero-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[20]  Zi Huang,et al.  Look Deeper See Richer: Depth-aware Image Paragraph Captioning , 2018, ACM Multimedia.

[21]  Ling Shao,et al.  Discriminative Fisher Embedding Dictionary Learning Algorithm for Object Recognition , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Jie Qin,et al.  Invertible Zero-Shot Recognition Flows , 2020, ECCV.

[23]  Bernt Schiele,et al.  Feature Generating Networks for Zero-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Bingbing Ni,et al.  Zero-Shot Action Recognition with Error-Correcting Output Codes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Xiaogang Wang,et al.  StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Zhe Gan,et al.  AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Yang Yang,et al.  CANZSL: Cycle-Consistent Adversarial Networks for Zero-Shot Learning from Natural Language , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[28]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[29]  Ruslan Salakhutdinov,et al.  Generating Images from Captions with Attention , 2015, ICLR.

[30]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[31]  Zi Huang,et al.  Progressive Graph Learning for Open-Set Domain Adaptation , 2020, ICML.

[32]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Zi Huang,et al.  Alleviating Feature Confusion for Generative Zero-shot Learning , 2019, ACM Multimedia.

[34]  Zi Huang,et al.  Leveraging the Invariant Side of Generative Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Cordelia Schmid,et al.  Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Ling Shao,et al.  Zero-VAE-GAN: Generating Unseen Features for Generalized and Transductive Zero-Shot Learning , 2020, IEEE Transactions on Image Processing.

[37]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[38]  Ling Shao,et al.  Region Graph Embedding Network for Zero-Shot Learning , 2020, ECCV.

[39]  Zhongfei Zhang,et al.  Stacked Semantics-Guided Attention Model for Fine-Grained Zero-Shot Learning , 2018, NeurIPS.

[40]  Zi Huang,et al.  From Zero-Shot Learning to Cold-Start Recommendation , 2019, AAAI.

[41]  Cordelia Schmid,et al.  Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Ahmed M. Elgammal,et al.  Link the Head to the "Beak": Zero Shot Learning from Noisy Text Description at Part Precision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Xi Peng,et al.  A Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).