Coarse2Fine: Local Consistency Aware Re-prediction for Weakly Supervised Object Localization

Weakly supervised object localization aims to localize objects of interest by using only image-level labels. Existing methods generally segment activation map by threshold to obtain mask and generate bounding box. However, the activation map is locally inconsistent, i.e., similar neighboring pixels of the same object are not equally activated, which leads to the blurred boundary issue: the localization result is sensitive to the threshold, and the mask obtained directly from the activation map loses the fine contours of the object, making it difficult to obtain a tight bounding box. In this paper, we introduce the Local Consistency Aware Re-prediction (LCAR) framework, which aims to recover the complete fine object mask from locally inconsistent activation map and hence obtain a tight bounding box. To this end, we propose the self-guided re-prediction module (SGRM), which employs a novel superpixel aggregation network to replace the post-processing of threshold segmentation. In order to derive more reliable pseudo label from the activation map to supervise the SGRM, we further design an affinity refinement module (ARM) that utilizes the original image feature to better align the activation map with the image appearance, and design a self-distillation CAM (SD-CAM) to alleviate the locator dependence on saliency. Experiments demonstrate that our LCAR outperforms the state-of-the-art on both the CUB-200-2011 and ILSVRC datasets, achieving 95.89% and 70.72% of GT-Know localization accuracy, respectively.

[1]  Yibing Zhan,et al.  Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers , 2022, Computer Vision and Pattern Recognition.

[2]  D. Vaufreydaz,et al.  Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Ying Tai,et al.  LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization , 2021, AAAI.

[4]  Linlin Shen,et al.  Online Refinement of Low-level Feature Based Activation Map for Weakly Supervised Object Localization , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Junwei Han,et al.  Strengthen Learning Tolerance for Weakly Supervised Object Localization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Shuguang Cui,et al.  Shallow Feature Matters for Weakly Supervised Object Localization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Bolei Zhou,et al.  TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Yunchao Wei,et al.  Inter-Image Communication for Weakly Supervised Localization , 2020, ECCV.

[10]  Yicong Zhou,et al.  Geometry Constrained Weakly Supervised Object Localization , 2020, ECCV.

[11]  Meng Yang,et al.  Erasing Integrated Learning: A Simple Yet Effective Approach for Weakly Supervised Object Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[13]  Stefan Roth,et al.  Single-Stage Semantic Segmentation From Image Labels , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[15]  Jianxin Wu,et al.  Rethinking the Route Towards Weakly Supervised Object Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Seong Joon Oh,et al.  Evaluating Weakly Supervised Object Localization Methods Right , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Chang Liu,et al.  DANet: Divergent Activation for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Hyunjung Shim,et al.  Attention-Based Dropout Layer for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Hang Su,et al.  Pixel-Adaptive Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yi Yang,et al.  Self-produced Guidance for Weakly-supervised Object Localization , 2018, ECCV.

[22]  Yi Yang,et al.  Adversarial Complementary Learning for Weakly Supervised Object Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Asako Kanezaki,et al.  Unsupervised Image Segmentation by Backpropagation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Yong Jae Lee,et al.  Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Seunghoon Hong,et al.  Weakly Supervised Semantic Segmentation Using Superpixel Pooling Network , 2017, AAAI.

[26]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Pedro H. O. Pinheiro,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[29]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[31]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[32]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  泰明 岡田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .