论文信息 - EKENet: Efficient knowledge enhanced network for real-time scene parsing

EKENet: Efficient knowledge enhanced network for real-time scene parsing

Abstract Scene parsing is essential for many high-level AI applications, such as intelligent vehicles and traffic surveillance. In this work, we propose a highly efficient and powerful deep convolutional neural network, namely Efficient Knowledge Enhanced Network (EKENet), for parsing scenes in real-time. Unlike most existing approaches that compromise efficiency for the sake of high accuracy, EKENet achieves an ideal trade-off between the two. Our EKENet is built upon a novel building block, namely Efficient Dual Abstraction (EDA) block, which employs an efficiently parallel convolution structure for extracting spatial features and modeling cross-channel correlations in a dual fashion. Additionally, a novel light-weight Encoding-Enhancing (EE) module is designed to enhance our EKENet, which can efficiently encode high-level knowledge extracted from top layers to guide the learning of low-level features from bottom layers. Extensive experiments on challenging benchmarks, Cityscapes and CamVid datasets, demonstrate that EKENet achieves the new state-of-the-art performance in terms of speed and accuracy tradeoff.

[1] Eduardo Romera,et al. ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[2] Fan Yang,et al. Multi-Scale Cascade Network for Salient Object Detection , 2017, ACM Multimedia.

[3] Dahua Lin,et al. Low-Latency Video Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4] Eugenio Culurciello,et al. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[5] Xin Li,et al. Webly-supervised learning for salient object detection , 2020, Pattern Recognit..

[6] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Shi Jianping,et al. Low-Latency Video Semantic Segmentation , 2018, CVPR 2018.

[9] Yichen Wei,et al. Deep Feature Flow for Video Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Yoshua Bengio,et al. The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13] Jian Sun,et al. DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Jizong Peng,et al. Deep Co-Training for Semi-Supervised Image Segmentation , 2019, Pattern Recognit..

[15] Vittorio Ferrari,et al. COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16] Jun Fu,et al. Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Fan Yang,et al. Multi-Scale Bidirectional FCN for Object Skeleton Extraction , 2018, AAAI.

[18] Gang Yu,et al. BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[19] Xiangyu Zhang,et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[20] Siwei Lyu,et al. Cascade Graph Neural Networks for RGB-D Salient Object Detection , 2020, ECCV.

[21] Roberto Cipolla,et al. Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[22] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[24] Xiaojuan Qi,et al. ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[25] Xin Li,et al. Hybrid Graph Neural Networks for Crowd Counting , 2020, AAAI.

[26] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Dinggang Shen,et al. Contour Knowledge Transfer for Salient Object Detection , 2018, ECCV.

[28] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[30] Ian D. Reid,et al. RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Linda G. Shapiro,et al. ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation , 2018, ECCV.

[32] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35] Fan Yang,et al. Saliency Transfer: An Example-Based Method for Salient Object Detection , 2016, IJCAI.

[36] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] Xiaogang Wang,et al. Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Anton van den Hengel,et al. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition , 2016, Pattern Recognit..

[40] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Yao Zhao,et al. Learning to segment with image-level annotations , 2016, Pattern Recognit..

[42] Trevor Darrell,et al. Clockwork Convnets for Video Semantic Segmentation , 2016, ECCV Workshops.

[43] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.