Simple feature pyramid network for weakly supervised object localization using multi-scale information

The purpose of weakly supervised object localization (WSOL) is to localize an object requiring only classification labels. However, most WSOL methods tend to find a specific part of an object. Further, they introduce more complex optimization problems than the classification problem to compensate for the lack of resources such as bounding box annotation. To be more efficient WSOL, we propose a new architecture that utilizes feature pyramid network (FPN) and multi-scale information to deal with simplified optimization and to improve the localization. In our proposed model, FPN produces multi-scale and high-quality feature maps, and then these feature maps are gathered to conduct classification. Therefore, we can use high-quality and abundant information for localization, which induces several advantages. First, our proposed model improves localization. Second, we don’t have to require solving complex optimization problem. In particular, the second advantage alleviates a significant burden such as hyperparameter tuning. Also, we confirmed through experiments that our proposed method outperforms state-of-the-art methods on the CUB-200-2011 and ILSVRC datasets.

[1]  Yong Jiang,et al.  Double Shot: Preserve and Erase Based Class Attention Networks for Weakly Supervised Localization (Peca-Net) , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[2]  Yong Jae Lee,et al.  Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Ying Chen,et al.  M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network , 2018, AAAI.

[4]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[5]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[8]  Hyunjung Shim,et al.  Attention-Based Dropout Layer for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[10]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Meng Yang,et al.  Erasing Integrated Learning: A Simple Yet Effective Approach for Weakly Supervised Object Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yi Yang,et al.  Adversarial Complementary Learning for Weakly Supervised Object Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[18]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[19]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[20]  Yi Yang,et al.  Self-produced Guidance for Weakly-supervised Object Localization , 2018, ECCV.

[21]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Chang Liu,et al.  DANet: Divergent Activation for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Marco Pedersoli,et al.  Convolutional STN for Weakly Supervised Object Localization , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).