论文信息 - Where are the Masks: Instance Segmentation with Image-level Supervision

Where are the Masks: Instance Segmentation with Image-level Supervision

A major obstacle in instance segmentation is that existing methods often need many per-pixel labels in order to be effective. These labels require large human effort and for certain applications, such labels are not readily available. To address this limitation, we propose a novel framework that can effectively train with image-level labels, which are significantly cheaper to acquire. For instance, one can do an internet search for the term "car" and obtain many images where a car is present with minimal effort. Our framework consists of two stages: (1) train a classifier to generate pseudo masks for the objects of interest; (2) train a fully supervised Mask R-CNN on these pseudo masks. Our two main contribution are proposing a pipeline that is simple to implement and is amenable to different segmentation methods; and achieves new state-of-the-art results for this problem setup. Our results are based on evaluating our method on PASCAL VOC 2012, a standard dataset for weakly supervised methods, where we demonstrate major performance gains compared to existing methods with respect to mean average precision.

Mark W. Schmidt | David Vázquez | Issam H. Laradji | David Vázquez | I. Laradji

[1] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[2] Klaus D. Tönnies,et al. Segmentation of medical images using adaptive region growing , 2001, SPIE Medical Imaging.

[3] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[4] Ronan Collobert,et al. From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Jian Sun,et al. ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Lei Zheng,et al. Instance Segmentation of Fibers from Low Resolution CT Scans via 3D Deep Embedding Learning , 2019, BMVC.

[7] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[10] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[11] Philip H. S. Torr,et al. Recurrent Instance Segmentation , 2015, ECCV.

[12] Alexander C. Berg,et al. RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free , 2019, ArXiv.

[13] Andrea Vedaldi,et al. Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Ronan Collobert,et al. Learning to Refine Object Segments , 2016, ECCV.

[15] Bernt Schiele,et al. Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Luc Van Gool,et al. Convolutional Oriented Boundaries , 2016, ECCV.

[17] Xiaojuan Qi,et al. Augmented Feedback in Semantic Segmentation Under Image Level Supervision , 2016, ECCV.

[18] F. Khan,et al. Object Counting and Instance Segmentation With Image-Level Supervision , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Richard S. Zemel,et al. End-to-End Instance Segmentation with Recurrent Attention , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Suha Kwak,et al. Learning Pixel-Level Semantic Affinity with Image-Level Supervision for Weakly Supervised Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Christoph H. Lampert,et al. Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation , 2016, ECCV.

[22] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Jonathan T. Barron,et al. Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25] ShelhamerEvan,et al. Fully Convolutional Networks for Semantic Segmentation , 2017 .

[26] George Papandreou,et al. MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27] Yong Jae Lee,et al. YOLACT: Real-Time Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28] Wenyu Liu,et al. Weakly Supervised Region Proposal Network and Object Detection , 2018, ECCV.

[29] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Luc Van Gool,et al. Boosting Object Proposals: From Pascal to COCO , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31] Suha Kwak,et al. Weakly Supervised Learning of Instance Segmentation With Inter-Pixel Relations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Jian Sun,et al. BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34] Qiang Qiu,et al. Weakly Supervised Instance Segmentation Using Class Peak Response , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35] Bernt Schiele,et al. What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[37] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Bernt Schiele,et al. Simple Does It: Weakly Supervised Instance and Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Wenyu Liu,et al. Multiple Instance Detection Network with Online Instance Classifier Refinement , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[41] Mark W. Schmidt,et al. Instance Segmentation with Point Supervision , 2019, ArXiv.

[42] Fei-Fei Li,et al. What's the Point: Semantic Segmentation with Point Supervision , 2015, ECCV.

[43] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[44] Mark W. Schmidt,et al. Where are the Blobs: Counting by Localization with Point Supervision , 2018, ECCV.

[45] Ronan Collobert,et al. Learning to Segment Object Candidates , 2015, NIPS.