论文信息 - Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition

Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition

Learning subtle yet discriminative features (e.g., beak and eyes for a bird) plays a significant role in fine-grained image recognition. Existing attention-based approaches localize and amplify significant parts to learn fine-grained details, which often suffer from a limited number of parts and heavy computational cost. In this paper, we propose to learn such fine-grained features from hundreds of part proposals by Trilinear Attention Sampling Network (TASN) in an efficient teacher-student manner. Specifically, TASN consists of 1) a trilinear attention module, which generates attention maps by modeling the inter-channel relationships, 2) an attention-based sampler which highlights attended parts with high resolution, and 3) a feature distiller, which distills part features into an object-level feature by weight sharing and feature preserving strategies. Extensive experiments verify that TASN yields the best performance under the same settings with the most competitive approaches, in iNaturalist-2017, CUB-Bird, and Stanford-Cars datasets.

[1] Luc Devroye,et al. Sample-based non-uniform random variate generation , 1986, WSC '86.

[2] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[4] Pietro Perona,et al. Caltech-UCSD Birds 200 , 2010 .

[5] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[7] Pietro Perona,et al. Bird Species Categorization Using Pose Normalized Deep Convolutional Nets , 2014, ArXiv.

[8] Seung Woo Lee,et al. Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[10] Xiaoou Tang,et al. A large-scale car dataset for fine-grained categorization and verification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[12] Yuxin Peng,et al. The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Zhiqiang Shen,et al. Multiple Granularity Descriptors for Fine-Grained Categorization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14] Marcel Simon,et al. Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[16] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Qi Tian,et al. Picking Deep Filter Responses for Fine-Grained Image Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Jonghyun Choi,et al. Mining Discriminative Triplets of Patches for Fine-Grained Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Xiao Liu,et al. Fully Convolutional Attention Localization Networks: Efficient Attention Localization for Fine-Grained Recognition , 2016, ArXiv.

[22] Xiao Liu,et al. Fully Convolutional Attention Networks for Fine-Grained Recognition , 2016 .

[23] Wu Liu,et al. Large-scale vehicle re-identification in urban surveillance videos , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[24] Junmo Kim,et al. A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[26] Zhichao Li,et al. Dynamic Computational Time for Visual Attention , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[27] Tao Mei,et al. Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28] Bhuvana Ramabhadran,et al. Efficient Knowledge Distillation from an Ensemble of Teachers , 2017, INTERSPEECH.

[29] Michael Lam,et al. Fine-Grained Recognition as HSnet Search for Informative Image Parts , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Tao Mei,et al. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Xiu-Shen Wei,et al. Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval , 2016, IEEE Transactions on Image Processing.

[32] Yang Song,et al. The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33] Qilong Wang,et al. Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34] Dong Wang,et al. Learning to Navigate for Fine-grained Classification , 2018, ECCV.

[35] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36] Yang Song,et al. Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37] Wojciech Matusik,et al. Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks , 2018, ECCV.

[38] Byoung-Tak Zhang,et al. Bilinear Attention Networks , 2018, NeurIPS.

[39] Errui Ding,et al. Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition , 2018, ECCV.

[40] Xiu-Shen Wei,et al. Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization , 2018, Pattern Recognit..

[41] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] Jin Young Choi,et al. Knowledge Distillation with Adversarial Samples Supporting Decision Boundary , 2018, AAAI.