Fine-Grained Attention for Weakly Supervised Object Localization

Although recent advances in deep learning accelerated an improvement in a weakly supervised object localization (WSOL) task, there are still challenges to identify the entire body of an object, rather than only discriminative parts. In this paper, we propose a novel residual fine-grained attention (RFGA) module that autonomously excites the less activated regions of an object by utilizing information distributed over channels and locations within feature maps in combination with a residual operation. To be specific, we devise a series of mechanisms of triple-view attention representation, attention expansion, and feature calibration. Unlike other attention-based WSOL methods that learn a coarse attention map, having the same values across elements in feature maps, our proposed RFGA learns finegrained values in an attention map by assigning different attention values for each of the elements. We validated the superiority of our proposed RFGA module by comparing it with the recent methods in the literature over three datasets. Further, we analyzed the effect of each mechanism in our RFGA and visualized attention maps to get insights.

[1]  A. Yuille,et al.  Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation , 2020, ECCV.

[2]  Yao Zhao,et al.  Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Qing-Long Zhang,et al.  SA-Net: Shuffle Attention for Deep Convolutional Neural Networks , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jiebo Luo,et al.  Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Shuiwang Ji,et al.  Kronecker Attention Networks , 2020, KDD.

[7]  Weilin Huang,et al.  Channel Interaction Networks for Fine-Grained Image Categorization , 2020, AAAI.

[8]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Seong Joon Oh,et al.  Evaluating Weakly Supervised Object Localization Methods Right , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Hyunjung Shim,et al.  Attention-Based Dropout Layer for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Gang Sun,et al.  Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks , 2018, NeurIPS.

[12]  Hyo-Eun Kim,et al.  SRM: A Style-Based Recalibration Module for Convolutional Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Ildoo Kim,et al.  Spatially Attentive Output Layer for Image Classification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[15]  Dahun Kim,et al.  Two-Phase Learning for Weakly Supervised Object Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Qilong Wang,et al.  ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Stephen Lin,et al.  GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[19]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Yunchao Wei,et al.  STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Yunchao Wei,et al.  CCNet: Criss-Cross Attention for Semantic Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Yi Zhu,et al.  Soft Proposal Networks for Weakly Supervised Object Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[24]  Yi Yang,et al.  Adversarial Complementary Learning for Weakly Supervised Object Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Changhu Wang,et al.  Improving Convolutional Networks With Self-Calibrated Convolutions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[29]  Sadbhavana Babar,et al.  Where to Look?: Mining Complementary Image Regions for Weakly Supervised Object Localization , 2021, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[30]  Seil Na,et al.  Learning Visual Context by Comparison , 2020, ECCV.

[31]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Vladlen Koltun,et al.  Exploring Self-Attention for Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yong Jae Lee,et al.  Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Yali Wang,et al.  Learning Attentive Pairwise Interaction for Fine-Grained Classification , 2020, AAAI.

[35]  Errui Ding,et al.  Compact Generalized Non-local Network , 2018, NeurIPS.

[36]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[37]  Seong Joon Oh,et al.  Evaluation for Weakly Supervised Object Localization: Protocol, Metrics, and Datasets , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Meng Yang,et al.  Erasing Integrated Learning: A Simple Yet Effective Approach for Weakly Supervised Object Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[41]  Yi Yang,et al.  Self-produced Guidance for Weakly-supervised Object Localization , 2018, ECCV.

[42]  Chang Liu,et al.  DANet: Divergent Activation for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).