Learning Deep Bilinear Transformation for Fine-grained Image Representation

Bilinear feature transformation has shown the state-of-the-art performance in learning fine-grained image representations. However, the computational cost to learn pairwise interactions between deep feature channels is prohibitively expensive, which restricts this powerful transformation to be used in deep neural networks. In this paper, we propose a deep bilinear transformation (DBT) block, which can be deeply stacked in convolutional neural networks to learn fine-grained image representations. The DBT block can uniformly divide input channels into several semantic groups. As bilinear transformation can be represented by calculating pairwise interactions within each group, the computational cost can be heavily relieved. The output of each block is further obtained by aggregating intra-group bilinear features, with residuals from the entire input features. We found that the proposed network achieves new state-of-the-art in several fine-grained image recognition benchmarks, including CUB-Bird, Stanford-Car, and FGVC-Aircraft.

[1]  Qilong Wang,et al.  Global Second-Order Pooling Convolutional Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Michael Lam,et al.  Fine-Grained Recognition as HSnet Search for Informative Image Parts , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Shu Kong,et al.  Low-Rank Bilinear Pooling for Fine-Grained Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[7]  Qi Tian,et al.  Picking Deep Filter Responses for Fine-Grained Image Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Subhransu Maji,et al.  Improved Bilinear Pooling with CNNs , 2017, BMVC.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Qilong Wang,et al.  Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Xiao Liu,et al.  Kernel Pooling for Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Kilian Q. Weinberger,et al.  CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Xiaoou Tang,et al.  A large-scale car dataset for fine-grained categorization and verification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Xiu-Shen Wei,et al.  Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization , 2018, Pattern Recognit..

[18]  Jonathan Krause,et al.  Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jiaying Liu,et al.  Factorized Bilinear Models for Image Recognition , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Bo Zhao,et al.  Diversified Visual Attention Networks for Fine-Grained Object Classification , 2016, IEEE Transactions on Multimedia.

[23]  Pietro Perona,et al.  Bird Species Categorization Using Pose Normalized Deep Convolutional Nets , 2014, ArXiv.

[24]  Yang Song,et al.  The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[27]  Marcel Simon,et al.  Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[29]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[31]  Seung Woo Lee,et al.  Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[33]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[34]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[35]  Byoung-Tak Zhang,et al.  Bilinear Attention Networks , 2018, NeurIPS.

[36]  Qilong Wang,et al.  Is Second-Order Information Helpful for Large-Scale Visual Recognition? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).