Hierarchical Complementary Learning for Weakly Supervised Object Localization

Weakly supervised object localization (WSOL) is a challenging problem which aims to localize objects with only image-level labels. Due to the lack of ground truth bounding boxes, class labels are mainly employed to train the model. This model generates a class activation map (CAM) which activates the most discriminate features. However, the main drawback of CAM is the ability to detect just a part of the object. To solve this problem, some researchers have removed parts from the detected object [1, 2, 3], or the image [4]. The aim of removing parts from image or detected parts of the object is to force the model to detect the other features. However, these methods require one or many hyperparameters to erase the appropriate pixels on the image, which could involve a loss of information. In contrast, this paper proposes a Hierarchical Complementary Learning Network method (HCLNet) that helps the CNN to perform better classification and localization of objects on the images. HCLNet uses a complementary map to force the network to detect the other parts of the object. Unlike previous works, this method does not need any extras hyperparameters to generate different CAMs, as well as does not introduce a big loss ?Fully documented templates are available in the elsarticle package on CTAN. ∗Corresponding author Email addresses: benassou.narimene@hit.edu.cn (Sabrina Narimene Benassou), wzhshi@szu.edu.cn (Wuzhen Shi), fjiang@hit.edu.cn (Feng Jiang), Abdallah.benzine@digeiz.com (Abdallah Benzine) Preprint submitted to Journal of LTEX Templates November 17, 2020 ar X iv :2 01 1. 08 01 4v 1 [ cs .C V ] 1 6 N ov 2 02 0 of information. In order to fuse these different maps, two different fusion strategies known as the addition strategy and the l1-norm strategy have been used. These strategies allowed to detect the whole object while excluding the background. Extensive experiments show that HCLNet obtains better performance than state-of-the-art methods.

[1]  Yi Yang,et al.  Adversarial Complementary Learning for Weakly Supervised Object Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Bernard Ghanem,et al.  W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Thomas Deselaers,et al.  Weakly Supervised Localization and Learning with Generic Knowledge , 2012, International Journal of Computer Vision.

[4]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[6]  P. Alam ‘T’ , 2021, Composites Engineering: An A–Z Guide.

[7]  Hui Li,et al.  DenseFuse: A Fusion Approach to Infrared and Visible Images , 2018, IEEE Transactions on Image Processing.

[8]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[10]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jinjun Xiong,et al.  TS2C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection , 2018, ECCV.

[13]  Wenyu Liu,et al.  Multiple Instance Detection Network with Online Instance Classifier Refinement , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[15]  Hyunjung Shim,et al.  Attention-Based Dropout Layer for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Qixiang Ye,et al.  Min-Entropy Latent Model for Weakly Supervised Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[19]  Yi Yang,et al.  Self-produced Guidance for Weakly-supervised Object Localization , 2018, ECCV.

[20]  Yong Jae Lee,et al.  Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Chang Liu,et al.  C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Luc Van Gool,et al.  Weakly Supervised Cascaded Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Fuqiang Zhou,et al.  FSSD: Feature Fusion Single Shot Multibox Detector , 2017, ArXiv.

[27]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.