Compare More Nuanced: Pairwise Alignment Bilinear Network for Few-Shot Fine-Grained Learning

The recognition ability of human beings is developed in a progressive way. Usually, children learn to discriminate various objects from coarse to fine-grained with limited supervision. Inspired by this learning process, we propose a simple yet effective model for the Few-Shot Fine-Grained (FSFG) recognition, which tries to tackle the challenging fine-grained recognition task using meta-learning. The proposed method, named Pairwise Alignment Bilinear Network (PABN), is an end-to-end deep neural network. Unlike traditional deep bilinear networks for fine-grained classification, which adopt the self-bilinear pooling to capture the subtle features of images, the proposed model uses a novel pairwise bilinear pooling to compare the nuanced differences between base images and query images for learning a deep distance metric. In order to match base image features with query image features, we design feature alignment losses before the proposed pairwise bilinear pooling. Experiment results on four fine-grained classification datasets and one generic few-shot dataset demonstrate that the proposed model outperforms both the state-of-the-art few-shot fine-grained and general few-shot methods.

[1]  Yang Song,et al.  The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Yongdong Zhang,et al.  One-Shot Fine-Grained Instance Retrieval , 2017, ACM Multimedia.

[3]  Yi Yang,et al.  Transductive Propagation Network for Few-shot Learning , 2018, ArXiv.

[4]  Yu Qiao,et al.  WildFish: A Large Benchmark for Fish Recognition in the Wild , 2018, ACM Multimedia.

[5]  Qilong Wang,et al.  Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Xiao Liu,et al.  Kernel Pooling for Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Patrick Jähnichen,et al.  Cross-modal Hallucination for Few-shot Fine-grained Recognition , 2018, ArXiv.

[8]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[9]  Hongdong Li,et al.  A Direct Method to Self-Calibrate a Surveillance Camera by Observing a Walking Pedestrian , 2009, 2009 Digital Image Computing: Techniques and Applications.

[10]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[11]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[14]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[15]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[17]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Jonathan Krause,et al.  Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Pietro Perona,et al.  Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Xiu-Shen Wei,et al.  Piecewise Classifier Mappings: Learning Fine-Grained Learners for Novel Categories With Few Examples , 2018, IEEE Transactions on Image Processing.

[21]  Jian Zhang,et al.  Face Detection with Effective Feature Extraction , 2010, ACCV.

[22]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[23]  Eunho Yang,et al.  Learning to Propagate Labels: Transductive Propagation Network for Few-Shot Learning , 2018, ICLR.