Lightweight and Progressively-Scalable Networks for Semantic Segmentation

Multi-scale learning frameworks have been regarded as a capable class of models to boost semantic segmentation. The problem nevertheless is not trivial especially for the real-world deployments, which often demand high efficiency in inference latency. In this paper, we thoroughly analyze the design of convolutional blocks (the type of convolutions and the number of channels in convolutions), and the ways of interactions across multiple scales, all from lightweight standpoint for semantic segmentation. With such in-depth comparisons, we conclude three principles, and accordingly devise Lightweight and Progressively-Scalable Networks (LPS-Net) that novelly expands the network complexity in a greedy manner. Technically, LPS-Net first capitalizes on the principles to build a tiny network. Then, LPS-Net progressively scales the tiny network to larger ones by expanding a single dimension (the number of convolutional blocks, the number of channels, or the input resolution) at one time to meet the best speed/accuracy tradeoff. Extensive experiments conducted on three datasets consistently demonstrate the superiority of LPS-Net over several efficient semantic segmentation methods. More remarkably, our LPS-Net achieves 73.4% mIoU on Cityscapes test set, with the speed of 413.5FPS on an NVIDIA GTX 1080Ti, leading to a performance improvement by 1.5% and a 65% speed-up against the state-of-the-art STDC. Code is available at \url{https://github.com/YihengZhang-CV/LPS-Net}.

[1]  Ruhui Ma,et al.  Fast and Accurate Scene Parsing via Bi-Direction Alignment Networks , 2021, 2021 IEEE International Conference on Image Processing (ICIP).

[2]  Xilin Chen,et al.  OCNet: Object Context for Semantic Segmentation , 2021, International Journal of Computer Vision.

[3]  Zhenhua Chai,et al.  Rethinking BiSeNet For Real-time Semantic Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Changxin Gao,et al.  Lite-HRNet: A Lightweight High-Resolution Network , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Weichao Sun,et al.  Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes , 2021, ArXiv.

[6]  Ningning Ma,et al.  RepVGG: Making VGG-style ConvNets Great Again , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Tal Hassner,et al.  HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Karan Sapra,et al.  Hierarchical Multi-Scale Attention for Semantic Segmentation , 2020, ArXiv.

[9]  Martin Jägersand,et al.  U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection , 2020, Pattern Recognit..

[10]  Gang Yu,et al.  BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation , 2020, International Journal of Computer Vision.

[11]  Peng Sun,et al.  Real-Time Semantic Segmentation via Auto Depth, Downsampling Joint Decision and Feature Aggregation , 2020, International Journal of Computer Vision.

[12]  Kuiyuan Yang,et al.  Semantic Flow for Fast and Accurate Scene Parsing , 2020, ECCV.

[13]  Qian Zhang,et al.  FasterSeg: Searching for Faster Real-time Semantic Segmentation , 2019, ICLR.

[14]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[15]  Chang Xu,et al.  GhostNet: More Features From Cheap Operations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Gang Yu,et al.  Real-Time Semantic Segmentation via Multiply Spatial Fusion Network , 2019, BMVC.

[17]  Jun Fu,et al.  Adaptive Context Network for Scene Parsing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Jianping Shi,et al.  Graph-Guided Architecture Search for Real-Time Semantic Segmentation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yang Zhao,et al.  Deep High-Resolution Representation Learning for Visual Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Tao Mei,et al.  daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices , 2019, ACM Multimedia.

[21]  Tao Mei,et al.  Customizable Architecture Search for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Dani Lischinski,et al.  ZigZagNet: Fusing Top-Down and Bottom-Up Context for Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[24]  Jian Sun,et al.  DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jiashi Feng,et al.  Partial Order Pruning: For Best Speed/Accuracy Trade-Off in Neural Architecture Search , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Roberto Cipolla,et al.  Fast-SCNN: Fast Semantic Segmentation Network , 2019, BMVC.

[27]  Gang Yu,et al.  BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[28]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[29]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[30]  Kurt Keutzer,et al.  SqueezeNext: Hardware-Aware Neural Network Design , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[32]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Xiaojuan Qi,et al.  ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[36]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[37]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Charless C. Fowlkes,et al.  Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation , 2016, ECCV.

[42]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[47]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[49]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[50]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[51]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[52]  Xin Yu,et al.  When Humans Meet Machines: Towards Efficient Segmentation Networks , 2020, BMVC.