论文信息 - Efficient Semantic Segmentation Using Spatio-Channel Dilated Convolutions

Efficient Semantic Segmentation Using Spatio-Channel Dilated Convolutions

There has been an increasing interest in reducing the computational cost to develop efficient deep convolutional neural networks (DCNN) for real-time semantic segmentation. In this paper, we introduce an efficient convolution method, Spatio-Channel dilated convolution (SCDC) which is composed of structured sparse kernels based on the principle of split-transform-merge. Specifically, it employs the kernels whose shapes are dilated, not only in spatial domain, but also in channel domain, using a channel sampling approach. Based on SCDC, we propose an efficient convolutional module named Efficient Spatio-Channel dilated convolution (ESC). With ESC modules, we further propose ESCNet based on ESPNet architecture which is one of the state-of-the-art real-time semantic segmentation network that can be easily deployed on edge devices. We evaluated our ESCNet on the Cityscapes dataset and obtained competitive results, with a good trade-off between accuracy and computational cost. The proposed ESCNet achieves 61.5 % mean intersection over union (IoU) with only 196 K network parameters, and processes high resolution images at a rate of 164 frames per second (FPS) on a standard Titan Xp GPU. Various experimental results show that our method is reasonably accurate, light, and fast.

Jae Myung Kim | Yong Seok Heo | Jaeseon Kim | Y. S. Heo

[1] Yong Xu,et al. Enlarging Effective Receptive Field of Convolutional Neural Networks for Better Semantic Segmentation , 2017, 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR).

[2] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] George Papandreou,et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[4] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5] Philip H. S. Torr,et al. Combining Appearance and Structure from Motion Features for Road Scene Understanding , 2009, BMVC.

[6] Roberto Cipolla,et al. Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[7] 한보형,et al. Learning Deconvolution Network for Semantic Segmentation , 2015 .

[8] Eugenio Culurciello,et al. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[9] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Thomas A. Funkhouser,et al. Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Eduardo Romera,et al. ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[12] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Sheng Tang,et al. CGNet: A Light-Weight Context Guided Network for Semantic Segmentation , 2018, IEEE Transactions on Image Processing.

[14] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Gang Yu,et al. BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[17] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[18] Hsueh-Ming Hang,et al. Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation , 2018, MMAsia.

[19] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Christopher Zach,et al. ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time , 2018, BMVC.

[21] Xiangyu Zhang,et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[22] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Gabriela Csurka,et al. What is a good evaluation measure for semantic segmentation? , 2013, BMVC.

[25] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[26] George Papandreou,et al. Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[27] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28] Linda G. Shapiro,et al. ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation , 2018, ECCV.

[29] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Linda G. Shapiro,et al. ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Bin Zhou,et al. A regional adaptive variational PDE model for computed tomography image reconstruction , 2019, Pattern Recognit..

[32] Wei Liu,et al. ParseNet: Looking Wider to See Better , 2015, ArXiv.

[33] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[34] Xiaojuan Qi,et al. ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[35] Marcin Wozniak,et al. Object detection and recognition via clustered features , 2018, Neurocomputing.

[36] Sheng Tang,et al. Scale-Adaptive Convolutions for Scene Parsing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37] Ian D. Reid,et al. RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39] S. Dwivedi,et al. Obesity May Be Bad: Compressed Convolutional Networks for Biomedical Image Segmentation , 2020 .

[40] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[41] Dong Liu,et al. IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks , 2018, BMVC.

[42] Quanfu Fan,et al. SC-Conv: Sparse-Complementary Convolution for Efficient Model Utilization on CNNs , 2018, 2018 IEEE International Symposium on Multimedia (ISM).

[43] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[44] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[47] Jingdong Wang,et al. Interleaved Group Convolutions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[48] Jianhuang Lai,et al. Interleaved Structured Sparse Convolutional Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.