Cross Attention Network for Few-shot Classification

Few-shot classification aims to recognize unlabeled samples from unseen classes given only few labeled samples. The unseen classes and low-data problem make few-shot classification very challenging. Many existing approaches extracted features from labeled and unlabeled samples independently, as a result, the features are not discriminative enough. In this work, we propose a novel Cross Attention Network to address the challenging problems in few-shot classification. Firstly, Cross Attention Module is introduced to deal with the problem of unseen classes. The module generates cross attention maps for each pair of class feature and query sample feature so as to highlight the target object regions, making the extracted feature more discriminative. Secondly, a transductive inference algorithm is proposed to alleviate the low-data problem, which iteratively utilizes the unlabeled query set to augment the support set, thereby making the class features more representative. Extensive experiments on two benchmarks show our method is a simple, effective and computationally efficient framework and outperforms the state-of-the-arts.

[1]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[2]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[3]  Lei Wang,et al.  Revisiting Local Descriptor Based Image-To-Class Measure for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Shiguang Shan,et al.  VRSTC: Occlusion-Free Video Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Tsendsuren Munkhdalai,et al.  Rapid Adaptation with Conditionally Shifted Neurons , 2017, ICML.

[7]  Tao Mei,et al.  Multi-level Attention Networks for Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[9]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[10]  Kate Saenko,et al.  Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.

[11]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[12]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[14]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[15]  Shiguang Shan,et al.  Interaction-And-Aggregation Network for Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[17]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[18]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.

[19]  Subhransu Maji,et al.  Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Cordelia Schmid,et al.  Areas of Attention for Image Captioning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yu Wu,et al.  Exploit the Unknown Gradually: One-Shot Video-Based Person Re-identification by Stepwise Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[25]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[26]  Bernt Schiele,et al.  Meta-Transfer Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[28]  Hong Yu,et al.  Meta Networks , 2017, ICML.

[29]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Colin Raffel,et al.  Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[31]  Richard J. Mammone,et al.  Meta-neural networks that learn by learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[32]  Jingdong Wang,et al.  Deeply-Learned Part-Aligned Representations for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Nikos Komodakis,et al.  Generating Classification Weights With GNN Denoising Autoencoders for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[35]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[36]  Eunho Yang,et al.  Learning to Propagate Labels: Transductive Propagation Network for Few-Shot Learning , 2018, ICLR.

[37]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[38]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[39]  Bernt Schiele,et al.  LCC: Learning to Customize and Combine Neural Networks for Few-Shot Learning , 2019, ArXiv.

[40]  Tao Mei,et al.  Memory Matching Networks for One-Shot Image Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Joshua B. Tenenbaum,et al.  Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[42]  Xiaogang Wang,et al.  Question-Guided Hybrid Convolution for Visual Question Answering , 2018, ECCV.

[43]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[44]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[46]  Luca Bertinetto,et al.  Learning feed-forward one-shot learners , 2016, NIPS.

[47]  In-So Kweon,et al.  BAM: Bottleneck Attention Module , 2018, BMVC.

[48]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[49]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.