Double Shot: Preserve and Erase Based Class Attention Networks for Weakly Supervised Localization (Peca-Net)

Weakly supervised localization has attracted increasing attention since only image-wise labels are needed. One mainstream approach, CAM based top-down localization method, suffers from poor resolution and localizing only the most discriminative regions. Another kind, model agnostic perturbation based method, suffers from multiple iterations for each sample. In this paper, we introduce PECA-Net: Preserve and Erase Based Class Attention Networks, which adopts preserve and erase perturbed U-net as the basis, with class activation mechanism as attention to enhance localization capability. Class attention module strengthens informative features and achieves a basic localization. Preserve and erase perturbed U-net replaces the random and iterative extrinsic perturbation with meaningful erasing. In addition, this structure refines the preliminary localization. Since the target object is hit twice, therefore, entitled as double shot. Experiments validate that localization error of both CUB-200 and ILSVRC ImageNet dataset is the new state-of-the-art.

[1]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[2]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[3]  Hyunjung Shim,et al.  Attention-Based Dropout Layer for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[7]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[9]  Yi Yang,et al.  Self-produced Guidance for Weakly-supervised Object Localization , 2018, ECCV.

[10]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yuxin Peng,et al.  Weakly Supervised Learning of Part Selection Model with Spatial Constraints for Fine-Grained Image Classification , 2017, AAAI.

[12]  Chang Liu,et al.  DANet: Divergent Activation for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Yi Yang,et al.  Adversarial Complementary Learning for Weakly Supervised Object Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[15]  Ya Zhang,et al.  Friend or Foe: Fine-Grained Categorization With Weak Supervision , 2017, IEEE Transactions on Image Processing.

[16]  Thomas Deselaers,et al.  Weakly Supervised Localization and Learning with Generic Knowledge , 2012, International Journal of Computer Vision.

[17]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[19]  Yong Jae Lee,et al.  Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Harald Kittler,et al.  Descriptor : The HAM 10000 dataset , a large collection of multi-source dermatoscopic images of common pigmented skin lesions , 2018 .