DifNet: Semantic Segmentation by Diffusion Networks

Deep Neural Networks (DNNs) have recently shown state of the art performance on semantic segmentation tasks, however, they still suffer from problems of poor boundary localization and spatial fragmented predictions. The difficulties lie in the requirement of making dense predictions from a long path model all at once since details are hard to keep when data goes through deeper layers. Instead, in this work, we decompose this difficult task into two relative simple sub-tasks: seed detection which is required to predict initial predictions without the need of wholeness and preciseness, and similarity estimation which measures the possibility of any two nodes belong to the same class without the need of knowing which class they are. We use one branch network for one sub-task each, and apply a cascade of random walks base on hierarchical semantics to approximate a complex diffusion process which propagates seed information to the whole image according to the estimated similarities. The proposed DifNet consistently produces improvements over the baseline models with the same depth and with the equivalent number of parameters, and also achieves promising performance on Pascal VOC and Pascal Context dataset. OurDifNet is trained end-to-end without complex loss functions.

[1]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Zhuowen Tu,et al.  Top-Down Learning for Structured Labeling with Convolutional Pseudoprior , 2015, ECCV.

[3]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[6]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[8]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[9]  Iasonas Kokkinos,et al.  Fast, Exact and Multi-scale Inference for Semantic Image Segmentation with Deep Gaussian CRFs , 2016, ECCV.

[10]  Peter V. Gehler,et al.  Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[14]  Jianbo Shi,et al.  Semantic Segmentation with Boundary Neural Fields , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jianbo Shi,et al.  Convolutional Random Walk Networks for Semantic Image Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jan Kautz,et al.  Learning Affinity via Spatial Propagation Networks , 2017, NIPS.

[18]  Rama Chellappa,et al.  Gaussian Conditional Random Field Network for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Xiaoxiao Li,et al.  Semantic Image Segmentation via Deep Parsing Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[21]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Iasonas Kokkinos,et al.  Segmentation-Aware Convolutional Networks Using Local Attention Masks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Iasonas Kokkinos,et al.  Dense and Low-Rank Gaussian CRFs Using Deep Embeddings , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Peng Jiang,et al.  Generic Promotion of Diffusion-Based Salient Object Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).