A Dual-Network Progressive Approach to Weakly Supervised Object Detection

A major challenge that arises in Weakly Supervised Object Detection (WSOD) is that only image-level labels are available, whereas WSOD trains instance-level object detectors. A typical approach to WSOD is to 1) generate a series of region proposals for each image and assign the image-level label to all the proposals in that image; 2) train a classifier using all the proposals; and 3) use the classifier to select proposals with high confidence scores as the positive instances for another round of training. In this way, the image-level labels are iteratively transferred to instance-level labels. We aim to resolve the following two fundamental problems within this paradigm. First, existing proposal generation algorithms are not yet robust, thus the object proposals are often inaccurate. Second, the selected positive instances are sometimes noisy and unreliable, which hinders the training at subsequent iterations. We adopt two separate neural networks, one to focus on each problem, to better utilize the specific characteristic of region proposal refinement and positive instance selection. Further, to leverage the mutual benefits of the two tasks, the two neural networks are jointly trained and reinforced iteratively in a progressive manner, starting with easy and reliable instances and then gradually incorporating difficult ones at a later stage when the selection classifier is more robust. Extensive experiments on the PASCAL VOC dataset show that our method achieves state-of-the-art performance.

[1]  Zaïd Harchaoui,et al.  On learning to localize objects with minimal supervision , 2014, ICML.

[2]  Pascal Mettes,et al.  Weakly-Supervised Recognition, Localization, and Explanation of Visual Entities , 2016, ACM Multimedia.

[3]  Ivan Laptev,et al.  ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization , 2016, ECCV.

[4]  Thomas Deselaers,et al.  Localizing Objects While Learning Their Appearance , 2010, ECCV.

[5]  Yong Jae Lee,et al.  Weakly-supervised Discovery of Visual Pattern Configurations , 2014, NIPS.

[6]  Ming-Hsuan Yang,et al.  Weakly Supervised Object Localization with Progressive Domain Adaptation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Thomas Deselaers,et al.  Weakly Supervised Localization and Learning with Generic Knowledge , 2012, International Journal of Computer Vision.

[9]  Tinne Tuytelaars,et al.  Weakly supervised object detection with convex clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Zhi-Hua Zhou,et al.  A New Analysis of Co-Training , 2010, ICML.

[11]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Paul A. Viola,et al.  Unsupervised improvement of visual detectors using cotraining , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[14]  Yuning Jiang,et al.  UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[15]  Deva Ramanan,et al.  Self-Paced Learning for Long-Term Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[18]  Luc Van Gool,et al.  Weakly Supervised Cascaded Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[21]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Maria-Florina Balcan,et al.  Co-Training and Expansion: Towards Bridging Theory and Practice , 2004, NIPS.

[23]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Deyu Meng,et al.  A theoretical understanding of self-paced learning , 2017, Inf. Sci..

[25]  Deyu Meng,et al.  Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search , 2014, ACM Multimedia.

[26]  Deyu Meng,et al.  Bridging Saliency Detection to Weakly Supervised Object Detection Based on Self-Paced Curriculum Learning , 2016, IJCAI.

[27]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[28]  Miaojing Shi,et al.  Weakly Supervised Object Localization Using Size Estimates , 2016, ECCV.

[29]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[30]  T. Tuytelaars,et al.  Weakly Supervised Object Detection with Posterior Regularization , 2014 .

[31]  Tao Xiang,et al.  In Defence of Negative Mining for Annotating Weakly Labelled Data , 2012, ECCV.

[32]  Wei Liu,et al.  Deep Self-Taught Learning for Weakly Supervised Object Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Qi Xie,et al.  Self-Paced Co-training , 2017, ICML.

[37]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[38]  Ivan Laptev,et al.  Weakly supervised object recognition with convolutional neural networks , 2014 .

[39]  Yi Yang,et al.  More is Less: A More Complicated Network with Less Inference Complexity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Tao Xiang,et al.  Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Tao Xiang,et al.  Weakly supervised object detector learning with model drift detection , 2011, 2011 International Conference on Computer Vision.

[42]  Nicu Sebe,et al.  Feature Selection for Multimedia Analysis by Sharing Information Among Multiple Tasks , 2013, IEEE Transactions on Multimedia.

[43]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Dragomir Anguelov,et al.  Self-taught object localization with deep networks , 2014, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[45]  Mubarak Shah,et al.  Online detection and classification of moving objects using progressively improving detectors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[46]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[47]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Chong Wang,et al.  Weakly Supervised Object Localization with Latent Category Learning , 2014, ECCV.

[49]  Cordelia Schmid,et al.  Multi-fold MIL Training for Weakly Supervised Object Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[51]  Yasuo Kuniyoshi,et al.  Weakly-supervised multi-class object detection using multi-type 3D features , 2013, MM '13.

[52]  Tinne Tuytelaars,et al.  Weakly Supervised Detection with Posterior Regularization , 2014, BMVC.

[53]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.