论文信息 - Semantic segmentation via highly fused convolutional network with multiple soft cost functions

Semantic segmentation via highly fused convolutional network with multiple soft cost functions

Semantic image segmentation is one of the most challenged tasks in computer vision. In this paper, we propose a highly fused convolutional network, which consists of three parts: feature downsampling, combined feature upsampling and multiple predictions. We adopt a strategy of multiple steps of upsampling and combined feature maps in pooling layers with its corresponding unpooling layers. Then we bring out multiple pre-outputs, each pre-output is generated from an unpooling layer by one-step upsampling. Finally, we concatenate these pre-outputs to get the final output. As a result, our proposed network makes highly use of the feature information by fusing and reusing feature maps. In addition, when training our model, we add multiple soft cost functions on pre-outputs and final outputs. In this way, we can reduce the loss reduction when the loss is back propagated. We evaluate our model on three major segmentation datasets: CamVid, PASCAL VOC and ADE20K. We achieve a state-of-the-art performance on CamVid dataset, as well as considerable improvements on PASCAL VOC dataset and ADE20K dataset

Tao Yang | Linting Guan | Yan Wu | Junqiao Zhao

[1] Vladlen Koltun,et al. Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3] Roberto Cipolla,et al. Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[4] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Yi Li,et al. R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[6] Vladlen Koltun,et al. Feature Space Optimization for Semantic Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Xiaofeng Wang,et al. An efficient local Chan-Vese model for image segmentation , 2010, Pattern Recognit..

[8] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[9] De-Shuang Huang,et al. A General CPL-AdS Methodology for Fixing Dynamic Parameters in Dual Environments , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Xiaogang Wang,et al. Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification , 2014, ArXiv.

[13] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[14] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15] Andrea Vedaldi,et al. MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[16] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[17] 한보형,et al. Learning Deconvolution Network for Semantic Segmentation , 2015 .

[18] Tao Yang,et al. Fully Combined Convolutional Network with Soft Cost Function for Traffic Scene Parsing , 2017, ICIC.

[19] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[21] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Truong Q. Nguyen,et al. Semantic video segmentation: Exploring inference efficiency , 2015, 2015 International SoC Design Conference (ISOCC).

[23] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Yoshua Bengio,et al. ReSeg: A Recurrent Neural Network for Object Segmentation , 2015, ArXiv.

[25] José García Rodríguez,et al. A Review on Deep Learning Techniques Applied to Semantic Segmentation , 2017, ArXiv.

[26] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Bolei Zhou,et al. Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[28] Ian D. Reid,et al. RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[30] Xiaoxiao Li,et al. Semantic Image Segmentation via Deep Parsing Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31] Dumitru Erhan,et al. Scalable, High-Quality Object Detection , 2014, ArXiv.

[32] De-Shuang Huang,et al. A constructive approach for finding arbitrary roots of polynomials by neural networks , 2004, IEEE Transactions on Neural Networks.

[33] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34] Peter V. Gehler,et al. Video Propagation Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Guosheng Lin,et al. Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Gregory Shakhnarovich,et al. Feedforward semantic segmentation with zoom-out features , 2014, CVPR.

[37] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[38] Yoshua Bengio,et al. The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[39] Xiaofeng Wang,et al. A Novel Multi-Layer Level Set Method for Image Segmentation , 2008, J. Univers. Comput. Sci..

[40] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.