Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection

Weakly Supervised Object Detection (WSOD) has emerged as an effective tool to train object detectors using only the image-level category labels. However, without object-level labels, WSOD detectors are prone to detect bounding boxes on salient objects, clustered objects and discriminative object parts. Moreover, the image-level category labels do not enforce consistent object detection across different transformations of the same images. To address the above issues, we propose a Comprehensive Attention Self-Distillation (CASD) training approach for WSOD. To balance feature learning among all object instances, CASD computes the comprehensive attention aggregated from multiple transformations and feature layers of the same images. To enforce consistent spatial supervision on objects, CASD conducts self-distillation on the WSOD networks, such that the comprehensive attention is approximated simultaneously by multiple transformations and feature layers of the same images. CASD produces new state-of-the-art WSOD results on standard benchmarks such as PASCAL VOC 2007/2012 and MS-COCO.

[1]  C. V. Jawahar,et al.  Dissimilarity Coefficient Based Weakly Supervised Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Shiguang Shan,et al.  Weakly Supervised Object Detection With Segmentation Collaboration , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Zeyi Huang,et al.  Improving Object Detection with Inverted Attention , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[5]  Wenyu Liu,et al.  Multiple Instance Detection Network with Online Instance Classifier Refinement , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Stefano Soatto,et al.  Multi-task incremental learning for object detection , 2020 .

[7]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[8]  Chen Change Loy,et al.  Learning Lightweight Lane Detection CNNs by Self Attention Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[10]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Yong Jae Lee,et al.  Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Rama Chellappa,et al.  Learning Without Memorizing , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Andrew G. Howard,et al.  Some Improvements on Deep Convolutional Neural Network Based Image Classification , 2013, ICLR.

[14]  Hongyang Chao,et al.  WSOD2: Learning Bottom-Up and Top-Down Objectness Distillation for Weakly-Supervised Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[16]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Lars Petersson,et al.  Incorporating Network Built-in Priors in Weakly-Supervised Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Rongxin Jiang,et al.  SLV: Spatial Likelihood Voting for Weakly Supervised Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Wenyu Liu,et al.  PCL: Proposal Cluster Learning for Weakly Supervised Object Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[21]  Lei Zhang,et al.  Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer , 2020, ECCV.

[22]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[23]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[24]  Dongrui Fan,et al.  C-MIDN: Coupled Multiple Instance Detection Network With Segmentation Guidance for Weakly Supervised Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Chang Liu,et al.  C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Liujuan Cao,et al.  Cyclic Guidance for Weakly Supervised Joint Detection and Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[28]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Hyunjung Shim,et al.  Attention-Based Dropout Layer for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[31]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[33]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Eric P. Xing,et al.  Self-Challenging Improves Cross-Domain Generalization , 2020, ECCV.

[36]  Zeyi Huang,et al.  Multiple Anchor Learning for Visual Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Luc Van Gool,et al.  Weakly Supervised Cascaded Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[39]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[41]  Ziyan Wu,et al.  Sharpen Focus: Learning With Attention Separability and Consistency , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Nojun Kwak,et al.  Consistency-based Semi-supervised Learning for Object detection , 2019, NeurIPS.

[43]  Hao Guo,et al.  Visual Attention Consistency Under Image Transforms for Multi-Label Image Classification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Wenyu Liu,et al.  Weakly Supervised Region Proposal Network and Object Detection , 2018, ECCV.

[45]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[46]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[47]  Stefano Soatto,et al.  Continual Universal Object Detection , 2020, ArXiv.