Semantic segmentation via highly fused convolutional network with multiple soft cost functions

Semantic image segmentation is one of the most challenged tasks in computer vision. In this paper, we propose a highly fused convolutional network, which consists of three parts: feature downsampling, combined feature upsampling and multiple predictions. We adopt a strategy of multiple steps of upsampling and combined feature maps in pooling layers with its corresponding unpooling layers. Then we bring out multiple pre-outputs, each pre-output is generated from an unpooling layer by one-step upsampling. Finally, we concatenate these pre-outputs to get the final output. As a result, our proposed network makes highly use of the feature information by fusing and reusing feature maps. In addition, when training our model, we add multiple soft cost functions on pre-outputs and final outputs. In this way, we can reduce the loss reduction when the loss is back propagated. We evaluate our model on three major segmentation datasets: CamVid, PASCAL VOC and ADE20K. We achieve a state-of-the-art performance on CamVid dataset, as well as considerable improvements on PASCAL VOC dataset and ADE20K dataset

[1]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[6]  Vladlen Koltun,et al.  Feature Space Optimization for Semantic Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Xiaofeng Wang,et al.  An efficient local Chan-Vese model for image segmentation , 2010, Pattern Recognit..

[8]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[9]  De-Shuang Huang,et al.  A General CPL-AdS Methodology for Fixing Dynamic Parameters in Dual Environments , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xiaogang Wang,et al.  Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification , 2014, ArXiv.

[13]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[14]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[16]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[17]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[18]  Tao Yang,et al.  Fully Combined Convolutional Network with Soft Cost Function for Traffic Scene Parsing , 2017, ICIC.

[19]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[21]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Truong Q. Nguyen,et al.  Semantic video segmentation: Exploring inference efficiency , 2015, 2015 International SoC Design Conference (ISOCC).

[23]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yoshua Bengio,et al.  ReSeg: A Recurrent Neural Network for Object Segmentation , 2015, ArXiv.

[25]  José García Rodríguez,et al.  A Review on Deep Learning Techniques Applied to Semantic Segmentation , 2017, ArXiv.

[26]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Bolei Zhou,et al.  Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[28]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[30]  Xiaoxiao Li,et al.  Semantic Image Segmentation via Deep Parsing Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Dumitru Erhan,et al.  Scalable, High-Quality Object Detection , 2014, ArXiv.

[32]  De-Shuang Huang,et al.  A constructive approach for finding arbitrary roots of polynomials by neural networks , 2004, IEEE Transactions on Neural Networks.

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  Peter V. Gehler,et al.  Video Propagation Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Gregory Shakhnarovich,et al.  Feedforward semantic segmentation with zoom-out features , 2014, CVPR.

[37]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[38]  Yoshua Bengio,et al.  The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[39]  Xiaofeng Wang,et al.  A Novel Multi-Layer Level Set Method for Image Segmentation , 2008, J. Univers. Comput. Sci..

[40]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.