TOAN: Target-Oriented Alignment Network for Fine-Grained Image Categorization With Few Labeled Samples

The challenges of high intra-class variance yet low inter-class fluctuations in fine-grained visual categorization are more severe with few labeled samples, \textit{i.e.,} Fine-Grained categorization problems under the Few-Shot setting (FGFS). High-order features are usually developed to uncover subtle differences between sub-categories in FGFS, but they are less effective in handling the high intra-class variance. In this paper, we propose a Target-Oriented Alignment Network (TOAN) to investigate the fine-grained relation between the target query image and support classes. The feature of each support image is transformed to match the query ones in the embedding feature space, which reduces the disparity explicitly within each category. Moreover, different from existing FGFS approaches devise the high-order features over the global image with less explicit consideration of discriminative parts, we generate discriminative fine-grained features by integrating compositional concept representations to global second-order pooling. Extensive experiments are conducted on four fine-grained benchmarks to demonstrate the effectiveness of TOAN compared with the state-of-the-art models.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Jiebo Luo,et al.  Distribution Consistency Based Covariance Metric Networks for Few-Shot Learning , 2019, AAAI.

[3]  Dacheng Tao,et al.  Collect and Select: Semantic Alignment Metric Learning for Few-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Xilin Chen,et al.  Cross Attention Network for Few-shot Classification , 2019, NeurIPS.

[5]  Sung Whan Yoon,et al.  TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning , 2019, ICML.

[6]  Patrick Pérez,et al.  Boosting Few-Shot Visual Learning With Self-Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[8]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[9]  Lei Zhang,et al.  Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Hongguang Zhang,et al.  Power Normalizing Second-Order Similarity Network for Few-Shot Learning , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Qilong Wang,et al.  Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Xiao Liu,et al.  Kernel Pooling for Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Junjie Zhang,et al.  Low-Rank Pairwise Alignment Bilinear Network For Few-Shot Fine-Grained Image Classification , 2019, ArXiv.

[15]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  David J. Crandall,et al.  Meta-Reinforced Synthetic Data for One-Shot Fine-Grained Visual Recognition , 2019, NeurIPS.

[17]  Lei Wang,et al.  DeepKSPD: Learning Kernel-matrix-based SPD Representation for Fine-grained Image Recognition , 2017, ECCV.

[18]  Kui Jia,et al.  PARN: Position-Aware Relation Networks for Few-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Kate Saenko,et al.  Weakly-supervised Compositional FeatureAggregation for Few-shot Recognition , 2019, ArXiv.

[20]  Shuqiang Jiang,et al.  Multi-attention Meta Learning for Few-shot Fine-grained Image Recognition , 2020, IJCAI.

[21]  Yuxin Peng,et al.  Fine-Grained Visual-Textual Representation Learning , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[23]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[26]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[27]  Tao Mei,et al.  Destruction and Construction Learning for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[29]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[30]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[31]  Yuxin Peng,et al.  Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization , 2019, International Journal of Computer Vision.

[32]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Patrick Jähnichen,et al.  Discriminative Hallucination for Multi-Modal Few-Shot Learning , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[34]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[35]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[36]  Xiaogang Wang,et al.  Finding Task-Relevant Features for Few-Shot Learning by Category Traversal , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Martial Hebert,et al.  Low-Shot Learning from Imaginary Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Xiu-Shen Wei,et al.  Piecewise Classifier Mappings: Learning Fine-Grained Learners for Novel Categories With Few Examples , 2018, IEEE Transactions on Image Processing.

[39]  Bharath Hariharan,et al.  Few-Shot Learning With Localization in Realistic Settings , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[41]  Qi Tian,et al.  Picking Deep Filter Responses for Fine-Grained Image Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[43]  Xiantong Zhen,et al.  Attentional Kernel Encoding Networks for Fine-Grained Visual Categorization , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[44]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[45]  Fatih Murat Porikli,et al.  A Deeper Look at Power Normalizations , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Anoop Cherian,et al.  Jensen-Bregman LogDet Divergence with Application to Efficient Similarity Search for Covariance Matrices , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Hui Tang,et al.  Fine-Grained Visual Categorization using Meta-Learning Optimization with Sample Selection of Auxiliary Data , 2018, ECCV.

[48]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[49]  Marcel Simon,et al.  Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Lei Wang,et al.  Revisiting Local Descriptor Based Image-To-Class Measure for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Eunho Yang,et al.  Learning to Propagate Labels: Transductive Propagation Network for Few-Shot Learning , 2018, ICLR.

[53]  Chunjie Zhang,et al.  Few-Shot Visual Classification Using Image Pairs With Binary Transformation , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[54]  Rui Zhang,et al.  Museum Exhibit Identification Challenge for the Supervised Domain Adaptation and Beyond , 2018, ECCV.

[55]  Jiebo Luo,et al.  Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Jonathan Krause,et al.  Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Pietro Perona,et al.  Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Jiebo Luo,et al.  Learning Deep Bilinear Transformation for Fine-grained Image Representation , 2019, NeurIPS.

[59]  Martial Hebert,et al.  Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.

[60]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[61]  Hong Yu,et al.  Meta Networks , 2017, ICML.

[62]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63]  Yizhou Yu,et al.  Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification From the Bottom Up , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Xinge You,et al.  Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition , 2018, ECCV.

[65]  Yuxin Peng,et al.  Only Learn One Sample: Fine-Grained Visual Categorization with One Sample Training , 2018, ACM Multimedia.

[66]  Donald D. Hoffman,et al.  Parts of recognition , 1984, Cognition.

[67]  Sergey Levine,et al.  Meta-Learning with Implicit Gradients , 2019, NeurIPS.

[68]  Jingdong Wang,et al.  Interleaved Group Convolutions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[69]  Qiang Wu,et al.  Compare More Nuanced: Pairwise Alignment Bilinear Network for Few-Shot Fine-Grained Learning , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[70]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.