论文信息 - Fast-SCNN: Fast Semantic Segmentation Network

Fast-SCNN: Fast Semantic Segmentation Network

The encoder-decoder framework is state-of-the-art for offline semantic image segmentation. Since the rise in autonomous systems, real-time computation is increasingly desirable. In this paper, we introduce fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024x2048px) suited to efficient computation on embedded devices with low memory. Building on existing two-branch methods for fast segmentation, we introduce our `learning to downsample' module which computes low-level features for multiple resolution branches simultaneously. Our network combines spatial detail at high resolution with deep features extracted at lower resolution, yielding an accuracy of 68.0% mean intersection over union at 123.5 frames per second on Cityscapes. We also show that large scale pre-training is unnecessary. We thoroughly validate our metric in experiments with ImageNet pre-training and the coarse labeled data of Cityscapes. Finally, we show even faster computation with competitive results on subsampled inputs, without any network modifications.

Roberto Cipolla | Rudra P. K. Poudel | Stephan Liwicki | R. Cipolla | Stephan Liwicki

[1] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[3] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[5] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Christopher Zach,et al. ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time , 2018, BMVC.

[7] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[10] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[12] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, ArXiv.

[13] Xiaojuan Qi,et al. ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[14] Eugenio Culurciello,et al. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[15] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.

[20] Linda G. Shapiro,et al. ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation , 2018, ECCV.

[21] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Yoshua Bengio,et al. ReSeg: A Recurrent Neural Network-Based Model for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[24] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25] Shuang Wu,et al. Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.

[26] Igor Carron,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[27] Gang Yu,et al. BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[28] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[29] Deborah Silver,et al. Feature Visualization , 1994, Scientific Visualization.

[30] Eduardo Romera,et al. ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[31] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[32] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[33] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[34] Pascal Fua,et al. Are spatial and global constraints really necessary for segmentation? , 2011, 2011 International Conference on Computer Vision.

[35] Davide Mazzini,et al. Guided Upsampling Network for Real-Time Semantic Segmentation , 2018, BMVC.

[36] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).