Tree-Structured Kronecker Convolutional Network for Semantic Segmentation

Most existing semantic segmentation methods employ atrous convolution to enlarge the receptive field of filters, but neglect partial information. To tackle this issue, we firstly propose a novel Kronecker convolution which adopts Kronecker product to expand the standard convolutional kernel for taking into account the partial feature neglected by atrous convolutions. Therefore, it can capture partial information and enlarge the receptive field of filters simultaneously without introducing extra parameters. Secondly, we propose a Tree-structured Feature Aggregation (TFA) module which follows a recursive rule to expand and forms a hierarchical structure. Thus, it can naturally learn representations of multi-scale objects and encode hierarchical contextual information in complex scenes. Finally, we design a Tree-structured Kronecker Convolutional Network (TKCN) which employs Kronecker convolution and TFA module. Extensive experiments on three datasets, PAS-CAL VOC 2012, PASCAL-Context and Cityscapes, verify the effectiveness of our proposed approach.

[1]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Rama Chellappa,et al.  Gaussian Conditional Random Field Network for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Xin Li,et al.  FoveaNet: Perspective-Aware Urban Scene Parsing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Kun Yu,et al.  DenseASPP for Semantic Segmentation in Street Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ye Wang,et al.  Semantic Segmentation with Reverse Attention , 2017, BMVC.

[9]  Peter V. Gehler,et al.  Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[12]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[14]  Gang Yu,et al.  Learning a Discriminative Feature Network for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jan Kautz,et al.  Learning Affinity via Spatial Propagation Networks , 2017, NIPS.

[17]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[18]  Vittorio Ferrari,et al.  COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[21]  Gang Yu,et al.  BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[22]  Xiaogang Wang,et al.  Spatial As Deep: Spatial CNN for Traffic Scene Understanding , 2017, AAAI.

[23]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[24]  Anton van den Hengel,et al.  Wider or Deeper: Revisiting the ResNet Model for Visual Recognition , 2016, Pattern Recognit..

[25]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[26]  Xiaogang Wang,et al.  Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[29]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[30]  Sheng Tang,et al.  CGNet: A Light-Weight Context Guided Network for Semantic Segmentation , 2018, IEEE Transactions on Image Processing.

[31]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Gang Wang,et al.  Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Guosheng Lin,et al.  Exploring Context with Deep Structured Models for Semantic Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[37]  Sheng Tang,et al.  Scale-Adaptive Convolutions for Scene Parsing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Xiaoxiao Li,et al.  Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  S. R. Searle,et al.  On the history of the kronecker product , 1983 .

[42]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[43]  Shuchang Zhou,et al.  Exploiting Local Structures with the Kronecker Layer in Convolutional Networks , 2015, ArXiv.

[44]  Sheng Tang,et al.  Global-residual and Local-boundary Refinement Networks for Rectifying Scene Parsing Predictions , 2017, IJCAI.