Semantic Bilinear Pooling for Fine-Grained Recognition

Naturally, fine-grained recognition, e.g., vehicle identification or bird classification, has specific hierarchical labels, where fine categories are always harder to be classified than coarse categories. However, most of the recent deep learning based methods neglect the semantic structure of fine-grained objects and do not take advantage of the traditional fine-grained recognition techniques (e.g. coarse-to-fine classification). In this paper, we propose a novel framework with a two-branch network (coarse branch and fine branch), i.e., semantic bilinear pooling, for fine-grained recognition with a hierarchical label tree. This framework can adaptively learn the semantic information from the hierarchical levels. Specifically, we design a generalized cross-entropy loss for the training of the proposed framework to fully exploit the semantic priors via considering the relevance between adjacent levels and enlarge the distance between samples of different coarse classes. Furthermore, our method leverages only the fine branch when testing so that it adds no overhead to the testing time. Experimental results show that our proposed method achieves state-of-the-art performance on four public datasets.

[1]  Errui Ding,et al.  Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition , 2018, ECCV.

[2]  Qixiang Ye,et al.  Selective Sparse Sampling for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Lei Zhang,et al.  G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Shu Kong,et al.  Low-Rank Bilinear Pooling for Fine-Grained Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Xiaoou Tang,et al.  A large-scale car dataset for fine-grained categorization and verification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Feng Zhou,et al.  Fine-Grained Image Classification by Exploring Bipartite-Graph Labels , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[9]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Xiao Liu,et al.  Fully Convolutional Attention Localization Networks: Efficient Attention Localization for Fine-Grained Recognition , 2016, ArXiv.

[11]  Dong Wang,et al.  Learning to Navigate for Fine-grained Classification , 2018, ECCV.

[12]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Wenxi Wu,et al.  Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding , 2018, ACM Multimedia.

[15]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Larry S. Davis,et al.  Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[18]  Lei Zhang,et al.  Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[20]  Dacheng Tao,et al.  Learning a Mixture of Granularity-Specific Experts for Fine-Grained Categorization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Huibing Wang,et al.  Deep CNNs With Spatially Weighted Pooling for Fine-Grained Car Recognition , 2017, IEEE Transactions on Intelligent Transportation Systems.

[23]  Xiao Liu,et al.  Kernel Pooling for Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[25]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[26]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Yuexian Zou,et al.  Using Coarse Label Constraint for Fine-Grained Visual Classification , 2019, MMM.

[28]  Subhransu Maji,et al.  Improved Bilinear Pooling with CNNs , 2017, BMVC.

[29]  Larry S. Davis,et al.  Cross-X Learning for Fine-Grained Visual Categorization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Qilong Wang,et al.  Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Fei Xiong,et al.  MoNet: Moments Embedding Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Tao Mei,et al.  Destruction and Construction Learning for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[35]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[36]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).