论文信息 - The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification

The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification

The key to solving fine-grained image categorization is finding discriminate and local regions that correspond to subtle visual traits. Great strides have been made, with complex networks designed specifically to learn part-level discriminate feature representations. In this paper, we show that it is possible to cultivate subtle details without the need for overly complicated network designs or training mechanisms – a single loss is all it takes. The main trick lies with how we delve into individual feature channels early on, as opposed to the convention of starting from a consolidated feature map. The proposed loss function, termed as mutual-channel loss (MC-Loss), consists of two channel-specific components: a discriminality component and a diversity component. The discriminality component forces all feature channels belonging to the same class to be discriminative, through a novel channel-wise attention mechanism. The diversity component additionally constraints channels so that they become mutually exclusive across the spatial dimension. The end result is therefore a set of feature channels, each of which reflects different locally discriminative regions for a specific class. The MC-Loss can be trained end-to-end, without the need for any bounding-box/part annotations, and yields highly discriminative regions during inference. Experimental results show our MC-Loss when implemented on top of common base networks can achieve state-of-the-art performance on all four fine-grained categorization datasets (CUB-Birds, FGVC-Aircraft, Flowers-102, and Stanford Cars). Ablative studies further demonstrate the superiority of the MC-Loss when compared with other recently proposed general-purpose losses for visual classification, on two different base networks. Codes are available at: https://github.com/dongliangchang/Mutual-Channel-Loss.

[1] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Xiao Liu,et al. Kernel Pooling for Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Jiansheng Chen,et al. Rethinking Feature Distribution for Loss Functions in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6] Yizhou Yu,et al. Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-Tuning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Na Liu,et al. Fine-Grained Age Estimation in the Wild With Attention LSTM Networks , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[8] Trevor Darrell,et al. Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[9] Ramesh Raskar,et al. Pairwise Confusion for Fine-Grained Visual Classification , 2017, ECCV.

[10] Zhiqiang Shen,et al. Multiple Granularity Descriptors for Fine-Grained Categorization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11] Yu Liu,et al. Rethinking Feature Discrimination and Polymerization for Large-scale Recognition , 2017, ArXiv.

[12] Jalil Taghia,et al. Insights Into Multiple/Single Lower Bound Approximation for Extended Variational Inference in Non-Gaussian Structured Data Modeling , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[13] Qing Han,et al. A New Approach to Track Multiple Vehicles With the Combination of Robust Detection and Two Classifiers , 2018, IEEE Transactions on Intelligent Transportation Systems.

[14] Jianjun Lei,et al. Fast Mode Decision Based on Grayscale Similarity and Inter-View Correlation for Depth Map Coding in 3D-HEVC , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[15] Yizhou Yu,et al. Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification From the Bottom Up , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Peng Xu,et al. Multigraph Transformer for Free-Hand Sketch Recognition , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[17] Jie Li,et al. Unsupervised Semantic-Preserving Adversarial Hashing for Image Search , 2019, IEEE Transactions on Image Processing.

[18] Shenghuo Zhu,et al. Efficient Object Detection and Segmentation for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Peng Xu,et al. Deep Learning for Free-Hand Sketch: A Survey , 2020, ArXiv.

[20] Yu Qiao,et al. A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[21] Jun Guo,et al. Variational Bayesian Learning for Dirichlet Process Mixture of Inverted Dirichlet Distributions in Non-Gaussian Image Feature Modeling , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[22] Andrew Zisserman,et al. Symbiotic Segmentation and Part Localization for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision.

[23] Jie Cao,et al. Dual Cross-Entropy Loss for Small-Sample Fine-Grained Vehicle Classification , 2019, IEEE Transactions on Vehicular Technology.

[24] Qingming Huang,et al. An Iterative Co-Saliency Framework for RGBD Images , 2017, IEEE Transactions on Cybernetics.

[25] Dong Wang,et al. Learning to Navigate for Fine-grained Classification , 2018, ECCV.

[26] Peter N. Belhumeur,et al. POOF: Part-Based One-vs.-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Ling Shao,et al. Semi-Heterogeneous Three-Way Joint Embedding Network for Sketch-Based Image Retrieval , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[28] Jiebo Luo,et al. Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30] Xuelong Li,et al. Parameter Free Large Margin Nearest Neighbor for Distance Metric Learning , 2017, AAAI.

[31] John R. Smith,et al. A web-based system for collaborative annotation of large image and video collections: an evaluation and user study , 2005, MULTIMEDIA '05.

[32] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.

[33] Lei Zhang,et al. Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[35] Yuxin Peng,et al. Object-Part Attention Model for Fine-Grained Image Classification , 2017, IEEE Transactions on Image Processing.

[36] Pietro Perona,et al. Bird Species Categorization Using Pose Normalized Deep Convolutional Nets , 2014, ArXiv.

[37] Larry S. Davis,et al. Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38] Tao Mei,et al. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[40] Tat-Seng Chua,et al. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Qiang Chen,et al. Network In Network , 2013, ICLR.

[42] Subhransu Maji,et al. Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[44] Subhransu Maji,et al. Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[45] Xing Ji,et al. CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46] Qi Tian,et al. Picking Deep Filter Responses for Fine-Grained Image Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Shaoguo Wen,et al. Fine-Grained Vehicle Classification With Channel Max Pooling Modified CNNs , 2019, IEEE Transactions on Vehicular Technology.

[48] Bhiksha Raj,et al. SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[50] Xianglong Liu,et al. Adversarial Fine-Grained Composition Learning for Unseen Attribute-Object Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51] Tao Mei,et al. Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[52] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[53] Qi Tian,et al. Hierarchical Part Matching for Fine-Grained Visual Categorization , 2013, 2013 IEEE International Conference on Computer Vision.