Weakly Supervised Region Proposal Network and Object Detection

The Convolutional Neural Network (CNN) based region proposal generation method (i.e. region proposal network), trained using bounding box annotations, is an essential component in modern fully supervised object detectors. However, Weakly Supervised Object Detection (WSOD) has not benefited from CNN-based proposal generation due to the absence of bounding box annotations, and is relying on standard proposal generation methods such as selective search. In this paper, we propose a weakly supervised region proposal network which is trained using only image-level annotations. The weakly supervised region proposal network consists of two stages. The first stage evaluates the objectness scores of sliding window boxes by exploiting the low-level information in CNN and the second stage refines the proposals from the first stage using a region-based CNN classifier. Our proposed region proposal network is suitable for WSOD, can be plugged into a WSOD network easily, and can share its convolutional computations with the WSOD network. Experiments on the PASCAL VOC and ImageNet detection datasets show that our method achieves the state-of-the-art performance for WSOD with performance gain of about \(3\%\) on average.

[1]  Wenyu Liu,et al.  Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Bo Wang,et al.  Single-Shot Object Detection with Enriched Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jianbo Shi,et al.  DeepEdge: A multi-scale bifurcated deep network for top-down contour detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[7]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[9]  Luc Van Gool,et al.  Weakly Supervised Cascaded Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[12]  Yi Zhu,et al.  Soft Proposal Networks for Weakly Supervised Object Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  Ming-Hsuan Yang,et al.  Weakly Supervised Object Localization with Progressive Domain Adaptation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Lars Petersson,et al.  Incorporating Network Built-in Priors in Weakly-Supervised Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[18]  Tinne Tuytelaars,et al.  Weakly supervised object detection with convex clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[22]  Jitendra Malik,et al.  DeepBox: Learning Objectness with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Wenjun Zeng,et al.  Object Detection in Videos by Short and Long Range Object Linking , 2018, ArXiv.

[24]  Fei-Fei Li,et al.  What's the Point: Semantic Segmentation with Point Supervision , 2015, ECCV.

[25]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[26]  Wenyu Liu,et al.  Multiple Instance Detection Network with Online Instance Classifier Refinement , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Wenyu Liu,et al.  Deep patch learning for weakly supervised object classification and discovery , 2017, Pattern Recognit..

[28]  Thomas Deselaers,et al.  Weakly Supervised Localization and Learning with Generic Knowledge , 2012, International Journal of Computer Vision.

[29]  Wenjun Zeng,et al.  Object Detection in Videos by High Quality Object Linking , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[31]  C. Lawrence Zitnick,et al.  Fast Edge Detection Using Structured Forests , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Kaiqi Huang,et al.  Weakly Supervised Large Scale Object Localization with Multiple Instance Learning and Bag Splitting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Ivan Laptev,et al.  ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization , 2016, ECCV.

[34]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[35]  Cristian Sminchisescu,et al.  CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Neelima Chavali,et al.  Object-Proposal Evaluation Protocol is ‘Gameable’ , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[40]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[41]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[42]  Cordelia Schmid,et al.  Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Wei Liu,et al.  Deep Self-Taught Learning for Weakly Supervised Object Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Xiang Bai,et al.  Relaxed Multiple-Instance SVM with Application to Object Discovery , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[45]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Chong Wang,et al.  Weakly Supervised Object Localization with Latent Category Learning , 2014, ECCV.

[47]  Miaojing Shi,et al.  Weakly Supervised Object Localization Using Things and Stuff Transfer , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).