Learning Deconvolution Network for Semantic Segmentation

We propose a novel semantic segmentation algorithm by learning a deep deconvolution network. We learn the network on top of the convolutional layers adopted from VGG 16-layer net. The deconvolution network is composed of deconvolution and unpooling layers, which identify pixelwise class labels and predict segmentation masks. We apply the trained network to each proposal in an input image, and construct the final semantic segmentation map by combining the results from all proposals in a simple manner. The proposed algorithm mitigates the limitations of the existing methods based on fully convolutional networks by integrating deep deconvolution network and proposal-wise prediction, our segmentation method typically identifies detailed structures and handles objects in multiple scales naturally. Our network demonstrates outstanding performance in PASCAL VOC 2012 dataset, and we achieve the best accuracy (72.5%) among the methods trained without using Microsoft COCO dataset through ensemble with the fully convolutional network.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[3]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[4]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[5]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[10]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[12]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[13]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[14]  Pedro H. O. Pinheiro,et al.  Weakly Supervised Semantic Segmentation with Convolutional Networks , 2014, ArXiv.

[15]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[16]  Antoni B. Chan,et al.  3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network , 2014, ACCV.

[17]  Gregory Shakhnarovich,et al.  Feedforward semantic segmentation with zoom-out features , 2014, CVPR.

[18]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[19]  Jian Sun,et al.  Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[22]  Seunghoon Hong,et al.  Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation , 2015, NIPS.

[23]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[24]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Seunghoon Hong,et al.  Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network , 2015, ICML.

[26]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[27]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[29]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.