Compositional Fine-Grained Low-Shot Learning

We develop a novel compositional generative model for zero- and few-shot learning to recognize fine-grained classes with a few or no training samples. Our key observation is that generating holistic features for fine-grained classes fails to capture small attribute differences between classes. Therefore, we propose a feature composition framework that learns to extract attribute features from training samples and combines them to construct fine-grained features for rare and unseen classes. Feature composition allows us to not only selectively compose features of every class from only relevant training samples, but also obtain diversity among composed features via changing samples used for the composition. In addition, instead of building holistic features for classes, we use our attribute features to form dense representations capable of capturing fine-grained attribute details of classes. We propose a training scheme that uses a discriminative model to construct features that are subsequently used to train the model itself. Therefore, we directly train the discriminative model on the composed features without learning a separate generative model. We conduct experiments on four popular datasets of DeepFashion, AWA2, CUB, and SUN, showing the effectiveness of our method.

[1]  Philip S. Yu,et al.  Generative Dual Adversarial Network for Generalized Zero-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yun Fu,et al.  Rethinking Zero-Shot Learning: A Conditional Visual Classification Perspective , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Shu Kong,et al.  Low-Rank Bilinear Pooling for Fine-Grained Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Dat T. Huynh,et al.  Interactive Multi-Label CNN Learning With Partial Labels , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yoshua Bengio,et al.  Mode Regularized Generative Adversarial Networks , 2016, ICLR.

[6]  Marc'Aurelio Ranzato,et al.  Task-Driven Modular Networks for Zero-Shot Compositional Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Qixiang Ye,et al.  Selective Sparse Sampling for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Matthew A. Brown,et al.  Low-Shot Learning with Imprinted Weights , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[10]  Dacheng Tao,et al.  Learning Unseen Concepts via Hierarchical Decomposition and Composition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Trevor Darrell,et al.  Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Bolei Zhou,et al.  Seeing What a GAN Cannot Generate , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  XiangTao,et al.  Transductive Multi-View Zero-Shot Learning , 2015 .

[16]  Errui Ding,et al.  Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition , 2018, ECCV.

[17]  Ramesh Raskar,et al.  Pairwise Confusion for Fine-Grained Visual Classification , 2017, ECCV.

[18]  Zhongfei Zhang,et al.  Stacked Semantics-Guided Attention Model for Fine-Grained Zero-Shot Learning , 2018, NeurIPS.

[19]  H. T. Kung,et al.  Stable and Efficient Representation Learning with Nonnegativity Constraints , 2014, ICML.

[20]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Zhengming Ding,et al.  Marginalized Latent Semantic Encoder for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[24]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[25]  Mohammad Norouzi,et al.  Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse , 2019, NeurIPS.

[26]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[27]  Dat T. Huynh,et al.  Fine-Grained Generalized Zero-Shot Learning via Dense Attribute-Based Attention , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Dat T. Huynh,et al.  A Shared Multi-Attention Framework for Multi-Label Zero-Shot Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Bernt Schiele,et al.  Zero-Shot Learning — The Good, the Bad and the Ugly , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yu-Chiang Frank Wang,et al.  Multi-label Zero-Shot Learning with Structured Knowledge Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Song-Chun Zhu,et al.  Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Yoshua Bengio,et al.  MetaGAN: An Adversarial Approach to Few-Shot Learning , 2018, NeurIPS.

[33]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[34]  Julien Rabin,et al.  Detecting Overfitting of Deep Generative Networks via Latent Recovery , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Xiaobo Jin,et al.  Attentive Region Embedding Network for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Michel Crucianu,et al.  Modeling Inter and Intra-Class Relations in the Triplet Loss for Zero-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Bharath Hariharan,et al.  Few-Shot Learning With Localization in Realistic Settings , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Dat Huynh,et al.  Compositional Zero-Shot Learning via Fine-Grained Dense Feature Composition , 2020, NeurIPS.

[40]  Ying Wu,et al.  Recognizing Part Attributes With Insufficient Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Bernt Schiele,et al.  Feature Generating Networks for Zero-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Jianwen Xie,et al.  Learning Feature-to-Feature Translator by Alternating Back-Propagation for Generative Zero-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Joshua B. Tenenbaum,et al.  Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[45]  Cewu Lu,et al.  Deep LAC: Deep localization, alignment and classification for fine-grained recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Venkatesh Saligrama,et al.  Generalized Zero-Shot Recognition Based on Visually Semantic Embedding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[48]  Ali Razavi,et al.  Preventing Posterior Collapse with delta-VAEs , 2019, ICLR.

[49]  Jacob Andreas,et al.  Measuring Compositionality in Representation Learning , 2019, ICLR.

[50]  Wei-Lun Chao,et al.  An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild , 2016, ECCV.

[51]  Bernt Schiele,et al.  Attribute Prototype Network for Zero-Shot Learning , 2020, NeurIPS.

[52]  Shuqiang Jiang,et al.  Multi-attention Meta Learning for Few-shot Fine-grained Image Recognition , 2020, IJCAI.

[53]  Gal Chechik,et al.  A causal view of compositional zero-shot recognition , 2020, NeurIPS.

[54]  Abhinav Gupta,et al.  Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Martial Hebert,et al.  Learning Compositional Representations for Few-Shot Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[56]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[57]  Zhiqiang Tang,et al.  Learning where to look: Semantic-Guided Multi-Attention Localization for Zero-Shot Learning , 2019, ArXiv.

[58]  Bharath Hariharan,et al.  Revisiting Pose-Normalization for Fine-Grained Few-Shot Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[60]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Fei-Fei Li,et al.  Attribute Learning in Large-Scale Datasets , 2010, ECCV Workshops.

[62]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[63]  R. Devon Hjelm,et al.  Locality and compositionality in zero-shot learning , 2019, ICLR.

[64]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[65]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[66]  Xinge You,et al.  Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition , 2018, ECCV.

[67]  Martial Hebert,et al.  Low-Shot Learning from Imaginary Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[68]  Jian Ni,et al.  Dual Adversarial Semantics-Consistent Network for Generalized Zero-Shot Learning , 2019, NeurIPS.

[69]  Larry S. Davis,et al.  Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[70]  Pedro H. O. Pinheiro,et al.  Adaptive Cross-Modal Few-Shot Learning , 2019, NeurIPS.

[71]  Frédéric Jurie,et al.  Generating Visual Representations for Zero-Shot Classification , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[72]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Wei Shen,et al.  Few-Shot Image Recognition by Predicting Parameters from Activations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[74]  Hyeonwoo Yu,et al.  Zero-shot Learning via Simultaneous Generating and Learning , 2019, NeurIPS.

[75]  Wei-Lun Chao,et al.  Synthesized Classifiers for Zero-Shot Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Rogério Schmidt Feris,et al.  Delta-encoder: an effective sample synthesis method for few-shot object recognition , 2018, NeurIPS.

[77]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[78]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[79]  Hao Wang,et al.  Rethinking Knowledge Graph Propagation for Zero-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Charles A. Sutton,et al.  VEEGAN: Reducing Mode Collapse in GANs using Implicit Variational Learning , 2017, NIPS.

[82]  Bharath Hariharan,et al.  Low-Shot Visual Recognition by Shrinking and Hallucinating Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[83]  Jo Yew Tham,et al.  Learning Attribute Representations with Localization for Flexible Fashion Search , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[84]  Michael I. Jordan,et al.  Generalized Zero-Shot Learning with Deep Calibration Network , 2018, NeurIPS.

[85]  Dacheng Tao,et al.  Learning a Mixture of Granularity-Specific Experts for Fine-Grained Categorization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[86]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  Bernt Schiele,et al.  F-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Martial Hebert,et al.  From Red Wine to Red Tomato: Composition with Context , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[90]  Deng Cai,et al.  Attribute Attention for Semantic Disambiguation in Zero-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[91]  Martial Hebert,et al.  Image Deformation Meta-Networks for One-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[92]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[93]  Xiu-Shen Wei,et al.  Multi-Label Image Recognition With Graph Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[94]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[95]  Ahmed M. Elgammal,et al.  Link the Head to the "Beak": Zero Shot Learning from Noisy Text Description at Part Precision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[96]  Yin Li,et al.  Compositional Learning for Human Object Interaction , 2018, ECCV.

[97]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Joint Latent Similarity Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[98]  Piyush Rai,et al.  Generalized Zero-Shot Learning via Synthesized Examples , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[99]  Gal Chechik,et al.  Adaptive Confidence Smoothing for Generalized Zero-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[100]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[101]  Jiebo Luo,et al.  Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[102]  Ramazan Gokberk Cinbis,et al.  Gradient Matching Generative Networks for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[103]  Colin Raffel,et al.  Towards GAN Benchmarks Which Require Generalization , 2020, ICLR.