Saliency Aware: Weakly Supervised Object Localization

Object localization aims at localizing the object in a given image. Due to the recent success of convolutional neural networks (CNNs), existing methods have shown promising results in weakly-supervised learning fashion. By training a classifier, these methods learn to localize the objects by visualizing the class discriminative localization maps based on the classification prediction. However, correct classification results would not guarantee sufficient localization performance since the model may only focus on the most discriminative parts rather than the entire object. To address the aforementioned issue, we propose a novel and end-to-end trainable network for weakly-supervised object localization. The key insights to our algorithm are two-fold. First, to encourage our model to focus on detecting foreground objects, we develop a salient object detection module. Second, we propose a perceptual triplet loss that further enhances the foreground object detection capability. As such, our model learns to predict objectness, resulting in more accurate localization results. We conduct experiments on the challenging ILSVRC dataset. Extensive experimental results demonstrate that the proposed approach performs favorably against the state-of-the-arts.

[1]  Yun-Chun Chen,et al.  Deep learning for malicious flow detection , 2017, 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC).

[2]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[3]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[4]  Jingdong Wang,et al.  Salient Object Detection: A Discriminative Regional Feature Integration Approach , 2013, International Journal of Computer Vision.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Yong Jae Lee,et al.  Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[11]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.

[12]  Ming-Hsuan Yang,et al.  Deep Semantic Matching with Foreground Detection and Cycle-Consistency , 2018, ACCV.

[13]  Yu-Chiang Frank Wang,et al.  Learning Resolution-Invariant Deep Representations for Person Re-Identification , 2019, AAAI.

[14]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.