Sum-Fusion and Cascaded Interpolation for Semantic Image Segmentation

Semantic image segmentation classifies every pixel in an image into categories but it is difficult for a model to be good at extracting features of every category for segmentation. As features in a model may be excel at classifying a specific class, combining different models may yield a better throughput, but it necessitates heavy parameter tuning. We propose to compromise to combine several convolutional layers of different kernel sizes to get more detailed information. In our proposed algorithm, we preserve the original structure of fully convolution network but replace the convolution layer after the last Pooling layer with four convolution layers of different kernel sizes to extract multi-scale information and then four sets of feature maps obtained after the four layers are element-wise sum-fused to one set followed with convolution operation. We also propose to employ cascaded interpolation for deconvolution to get score maps as large as the corresponding input image. We evaluate our algorithm on SIFTFLOW dataset, and we really improve the segmentation accuracy.

[1]  S. Mallat A wavelet tour of signal processing , 1998 .

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[4]  Jonathan T. Barron,et al.  Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[6]  Seunghoon Hong,et al.  Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation , 2015, NIPS.

[7]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[8]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[9]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[10]  Svetlana Lazebnik,et al.  Finding Things: Image Parsing with Regions and Per-Exemplar Detectors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Wolfram Burgard,et al.  Robust Semantic Segmentation using Deep Fusion , 2016, RSS 2016.

[12]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.