论文信息 - Instance-Aware Semantic Segmentation via Multi-task Network Cascades

Instance-Aware Semantic Segmentation via Multi-task Network Cascades

Jian Sun

Kaiming He

Jifeng Dai

Kaiming He

Jian Sun

Jifeng Dai

Abstract:Semantic segmentation research has recently witnessed rapid progress, but many leading methods are unable to identify object instances. In this paper, we present Multitask Network Cascades for instance-aware semantic segmentation. Our model consists of three networks, respectively differentiating instances, estimating masks, and categorizing objects. These networks form a cascaded structure, and are designed to share their convolutional features. We develop an algorithm for the nontrivial end-to-end training of this causal, cascaded structure. Our solution is a clean, single-step training framework and can be generalized to cascades that have more stages. We demonstrate state-of-the-art instance-aware semantic segmentation accuracy on PASCAL VOC. Meanwhile, our method takes only 360ms testing an image using VGG-16, which is two orders of magnitude faster than previous systems for this challenging problem. As a by product, our method also achieves compelling object detection results which surpass the competitive Fast/Faster R-CNN systems. The method described in this paper is the foundation of our submissions to the MS COCO 2015 segmentation competition, where we won the 1st place.

参考文献

[1] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[2] Rich Caruana,et al. Multitask Learning , 1997, Machine-mediated learning.

[3] Subhransu Maji,et al. Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[4] Cristian Sminchisescu,et al. Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[5] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6] Cristian Sminchisescu,et al. CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[8] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.

[9] Jürgen Schmidhuber,et al. Compete to Compute , 2013, NIPS.

[10] Rob Fergus,et al. Visualizing and Understanding Convolutional Neural Networks , 2013 .

[11] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Jitendra Malik,et al. Simultaneous Detection and Segmentation , 2014, ECCV.

[13] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[14] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[15] Jonathan T. Barron,et al. Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[17] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[18] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[19] Jian Sun,et al. Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Jitendra Malik,et al. Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Ronan Collobert,et al. Learning to Segment Object Candidates , 2015, NIPS.

[23] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[24] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26] Nikos Komodakis,et al. Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27] George Papandreou,et al. Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[28] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29] Jian Sun,et al. BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[31] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

引用

Aerial Imagery for Roof Segmentation: A Large-Scale Dataset towards Automatic Mapping of Buildings

ArXiv

2018

Attribute Driven Zero-Shot Classification and Segmentation

2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)

2018

Exploring Flood Filling Networks for Instance Segmentation of XXL-Volumetric and Bulk Material CT Data

Journal of Nondestructive Evaluation

2020

Intelligent monitoring of indoor surveillance video based on deep learning

2019 21st International Conference on Advanced Communication Technology (ICACT)

2019

A review of object detection based on deep learning

Multimedia Tools and Applications

2020

Deep Cross-Domain Fashion Recommendation

RecSys

2017

Research on the Application of Instance Segmentation Algorithm in the Counting of Metro Waiting Population

ICGEC

2019

Attention Receptive Pyramid Network for Ship Detection in SAR Images

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

2020

Universal representations: The missing link between faces, text, planktons, and cat breeds

ArXiv

2017

360-Indoor: Towards Learning Real-World Objects in 360° Indoor Equirectangular Images

2020 IEEE Winter Conference on Applications of Computer Vision (WACV)

2019

Learning Region Features for Object Detection

ECCV

2018

MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving

2018 IEEE Intelligent Vehicles Symposium (IV)

2016

Multi-task human analysis in still images: 2D/3D pose, depth map, and multi-part segmentation

2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019)

2019

Fine-Grained Recognition in the Wild: A Multi-task Domain Adaptation Approach

2017 IEEE International Conference on Computer Vision (ICCV)

2017

End-to-End Instance Segmentation and Counting with Recurrent Attention

ArXiv

2016

End-to-End Instance Segmentation with Recurrent Attention

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2016

Instance-Aware Semantic Segmentation via Multi-task Network Cascades

Real-time Factored ConvNets: Extracting the X Factor in Human Parsing

Controlling the Transport Defects of Power Generating Solar Panels

Object-level image segmentation with prior information

Instance Segmentation Based on Superpixel Module and Attention Module

Aerial Imagery for Roof Segmentation: A Large-Scale Dataset towards Automatic Mapping of Buildings

Attribute Driven Zero-Shot Classification and Segmentation

Exploring Flood Filling Networks for Instance Segmentation of XXL-Volumetric and Bulk Material CT Data

Intelligent monitoring of indoor surveillance video based on deep learning

A review of object detection based on deep learning

Deep Cross-Domain Fashion Recommendation

Research on the Application of Instance Segmentation Algorithm in the Counting of Metro Waiting Population

Attention Receptive Pyramid Network for Ship Detection in SAR Images

Universal representations: The missing link between faces, text, planktons, and cat breeds

360-Indoor: Towards Learning Real-World Objects in 360° Indoor Equirectangular Images

Learning Region Features for Object Detection

MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving

Multi-task human analysis in still images: 2D/3D pose, depth map, and multi-part segmentation

Fine-Grained Recognition in the Wild: A Multi-task Domain Adaptation Approach

End-to-End Instance Segmentation and Counting with Recurrent Attention

End-to-End Instance Segmentation with Recurrent Attention