论文信息 - Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++

Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++

Manually labeling datasets with object masks is extremely time consuming. In this work, we follow the idea of Polygon-RNN [4] to produce polygonal annotations of objects interactively using humans-in-the-loop. We introduce several important improvements to the model: 1) we design a new CNN encoder architecture, 2) show how to effectively train the model with Reinforcement Learning, and 3) significantly increase the output resolution using a Graph Neural Network, allowing the model to accurately annotate high-resolution objects in images. Extensive evaluation on the Cityscapes dataset [8] shows that our model, which we refer to as Polygon-RNN++, significantly outperforms the original model in both automatic (10% absolute and 16% relative improvement in mean IoU) and interactive modes (requiring 50% fewer clicks by annotators). We further analyze the cross-domain scenario in which our model is trained on one dataset, and used out of the box on datasets from varying domains. The results show that Polygon-RNN++ exhibits powerful generalization capabilities, achieving significant improvements over existing pixel-wise methods. Using simple online fine-tuning we further achieve a high reduction in annotation time for new datasets, moving a step closer towards an interactive annotation tool to be used in practice.

[1] Marie-Pierre Jolly,et al. Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[2] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[3] Vladimir Kolmogorov,et al. An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Andrew Blake,et al. "GrabCut" , 2004, ACM Trans. Graph..

[5] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[6] Carissa G. Fonseca,et al. Rationale and Design for the Defibrillators to Reduce Risk by Magnetic Resonance Imaging Evaluation (DETERMINE) Trial , 2009, Journal of cardiovascular electrophysiology.

[7] Sven J. Dickinson,et al. Superedge grouping for object localization by combining appearance and shape information , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Gerhard Stephan,et al. Segmented anisotropic ssTEM dataset of neural tissue , 2013 .

[10] Pascal Fua,et al. Free-Shape Polygonal Object Localization , 2014, ECCV.

[11] Nicholas Ayache,et al. A collaborative resource to build consensus for automated left ventricular segmentation of cardiac MR images , 2014, Medical Image Anal..

[12] Sanja Fidler,et al. Beat the MTurkers: Automatic Image Labeling from Weak 3D Supervision , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Thomas Brox,et al. Video Segmentation with Just a Few Strokes , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[15] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17] Ronan Collobert,et al. Learning to Segment Object Candidates , 2015, NIPS.

[18] Sanja Fidler,et al. Monocular Object Instance Segmentation and Depth Ordering with CNNs , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19] Dit-Yan Yeung,et al. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[20] Florent Lafarge,et al. Towards Large-Scale City Reconstruction from Satellites , 2016, ECCV.

[21] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[23] Ronan Collobert,et al. Learning to Refine Object Segments , 2016, ECCV.

[24] Jitendra Malik,et al. Iterative Instance Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Richard S. Zemel,et al. Gated Graph Sequence Neural Networks , 2015, ICLR.

[26] Philip H. S. Torr,et al. Recurrent Instance Segmentation , 2015, ECCV.

[27] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[28] Sanja Fidler,et al. Instance-Level Segmentation for Autonomous Driving with Deep Densely Connected MRFs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Bastian Leibe,et al. Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Bolei Zhou,et al. Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Sanja Fidler,et al. 3D Graph Neural Networks for RGBD Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35] Sanja Fidler,et al. Annotating Object Instances with a Polygon-RNN , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Konstantinos Kamnitsas,et al. DeepCut: Object Segmentation From Bounding Box Annotations Using Convolutional Neural Networks , 2016, IEEE Transactions on Medical Imaging.

[38] Sergio Guadarrama,et al. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Min Bai,et al. Deep Watershed Transform for Instance Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Min Bai,et al. TorontoCity: Seeing the World with a Million Eyes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[41] Sanja Fidler,et al. SGN: Sequential Grouping Networks for Instance Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.