Rethinking 1D convolution for lightweight semantic segmentation

Lightweight semantic segmentation promotes the application of semantic segmentation in tiny devices. The existing lightweight semantic segmentation network (LSNet) has the problems of low precision and a large number of parameters. In response to the above problems, we designed a full 1D convolutional LSNet. The tremendous success of this network is attributed to the following three modules: 1D multi-layer space module (1D-MS), 1D multi-layer channel module (1D-MC), and flow alignment module (FA). The 1D-MS and the 1D-MC add global feature extraction operations based on the multi-layer perceptron (MLP) idea. This module uses 1D convolutional coding, which is more flexible than MLP. It increases the global information operation, improving features’ coding ability. The FA module fuses high-level and low-level semantic information, which solves the problem of precision loss caused by the misalignment of features. We designed a 1D-mixer encoder based on the transformer structure. It performed fusion encoding of the feature space information extracted by the 1D-MS module and the channel information extracted by the 1D-MC module. 1D-mixer obtains high-quality encoded features with very few parameters, which is the key to the network’s success. The attention pyramid with FA (AP-FA) uses an AP to decode features and adds a FA module to solve the problem of feature misalignment. Our network requires no pre-training and only needs a 1080Ti GPU for training. It achieved 72.6 mIoU and 95.6 FPS on the Cityscapes dataset and 70.5 mIoU and 122 FPS on the CamVid dataset. We ported the network trained on the ADE2K dataset to mobile devices, and the latency of 224 ms proves the application value of the network on mobile devices. The results on the three datasets prove that the network generalization ability we designed is powerful. Compared to state-of-the-art lightweight semantic segmentation algorithms, our designed network achieves the best balance between segmentation accuracy and parameters. The parameters of LSNet are only 0.62 M, which is currently the network with the highest segmentation accuracy within 1 M parameters.

[1]  D. Yang,et al.  UrbanLF: A Comprehensive Light Field Dataset for Semantic Segmentation of Urban Scenes , 2022, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Emmanuel Asiedu Brempong,et al.  Denoising Pretraining for Semantic Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Chengrou Lu,et al.  Visual attention network , 2022, Computational Visual Media.

[4]  Y. Wu,et al.  Review the state-of-the-art technologies of semantic segmentation based on deep learning , 2022, Neurocomputing.

[5]  Ralph R. Martin,et al.  Attention mechanisms in computer vision: A survey , 2021, Computational Visual Media.

[6]  Anima Anandkumar,et al.  SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , 2021, NeurIPS.

[7]  Kai Ma,et al.  Lightweight and efficient asymmetric network design for real-time semantic segmentation , 2021, Appl. Intell..

[8]  Zhenhua Chai,et al.  Rethinking BiSeNet For Real-time Semantic Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Luc Van Gool,et al.  LocalViT: Bringing Locality to Vision Transformers , 2021, ArXiv.

[10]  Enhua Wu,et al.  Transformer in Transformer , 2021, NeurIPS.

[11]  Chunhua Shen,et al.  Conditional Positional Encodings for Vision Transformers , 2021, ICLR.

[12]  Fahad Shahbaz Khan,et al.  Transformers in Vision: A Survey , 2021, ACM Comput. Surv..

[13]  Tao Xiang,et al.  Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  D. Tao,et al.  A Survey on Vision Transformer , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Tal Hassner,et al.  HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[17]  Junqiao Zhao,et al.  Dense Dual-Path Network for Real-time Semantic Segmentation , 2020, ACCV.

[18]  Hongtao Lu,et al.  LRNNET: A Light-Weighted Network with Efficient Reduced Non-Local Operation for Real-Time Semantic Segmentation , 2020, 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[19]  Gang Yu,et al.  BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation , 2020, International Journal of Computer Vision.

[20]  Kuiyuan Yang,et al.  Semantic Flow for Fast and Accurate Scene Parsing , 2020, ECCV.

[21]  Erdem Akagunduz,et al.  A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D Images , 2019, Appl. Artif. Intell..

[22]  Baoyu Zheng,et al.  FDDWNet: A Lightweight Convolutional Neural Network for Real-Time Semantic Segmentation , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Liang Liu,et al.  Esnet: Edge-Based Segmentation Network for Real-Time Semantic Segmentation in Traffic Scenes , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[24]  Gen Li,et al.  DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation , 2019, BMVC.

[25]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[26]  Yu Wang,et al.  Lednet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[27]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Jian Sun,et al.  DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Roberto Cipolla,et al.  Fast-SCNN: Fast Semantic Segmentation Network , 2019, BMVC.

[30]  Kaiming He,et al.  Panoptic Feature Pyramid Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Linda G. Shapiro,et al.  ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Sheng Tang,et al.  CGNet: A Light-Weight Context Guided Network for Semantic Segmentation , 2018, IEEE Transactions on Image Processing.

[33]  Gang Yu,et al.  BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[34]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[35]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[36]  Linda G. Shapiro,et al.  ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation , 2018, ECCV.

[37]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[38]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Eugenio Culurciello,et al.  LinkNet: Exploiting encoder representations for efficient semantic segmentation , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[40]  Xiaojuan Qi,et al.  ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[41]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Sepp Hochreiter,et al.  Speeding up Semantic Segmentation for Autonomous Driving , 2016 .

[43]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[44]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[46]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[50]  Jiacong Xu,et al.  PIDNet: A Real-time Semantic Segmentation Network Inspired from PID Controller , 2022, ArXiv.

[51]  Eduardo Romera,et al.  ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[52]  Mohammad Rastegari,et al.  ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).