Fast Semantic Segmentation for Scene Perception

Semantic segmentation is a challenging problem in computer vision. Many applications, such as autonomous driving and robot navigation with urban road scene, need accurate and efficient segmentation. Most state-of-the-art methods focus on accuracy, rather than efficiency. In this paper, we propose a more efficient neural network architecture, which has fewer parameters, for semantic segmentation in the urban road scene. An asymmetric encoder–decoder structure based on ResNet is used in our model. In the first stage of encoder, we use continuous factorized block to extract low-level features. Continuous dilated block is applied in the second stage, which ensures that the model has a larger view field, while keeping the model small-scale and shallow. The down sampled features from encoder are up sampled with decoder to the same-size output as the input image and the details refined. Our model can achieve end-to-end and pixel-to-pixel training without pretraining from scratch. The parameters of our model are only <inline-formula><tex-math notation="LaTeX">$0.2M$</tex-math></inline-formula>, <inline-formula><tex-math notation="LaTeX">$100 \times$</tex-math></inline-formula> less than those of others such as SegNet, etc. Experiments are conducted on five public road scene datasets (CamVid, CityScapes, Gatech, KITTI Road Detection, and KITTI Semantic Segmentation), and the results demonstrate that our model can achieve better performance.

[1]  Sepp Hochreiter,et al.  Speeding up Semantic Segmentation for Autonomous Driving , 2016 .

[2]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[3]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Qingming Huang,et al.  Hedged Deep Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Zhijun Li,et al.  Visual Servoing of Constrained Mobile Robots Based on Model Predictive Control , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[7]  Ankit Laddha,et al.  Map-supervised road detection , 2016, 2016 IEEE Intelligent Vehicles Symposium (IV).

[8]  Ethan Fetaya,et al.  StixelNet: A Deep Convolutional Network for Obstacle Detection and Road Segmentation , 2015, BMVC.

[9]  Toshio Fukuda,et al.  Reinforcement Learning of Manipulation and Grasping Using Dynamical Movement Primitives for a Humanoidlike Mobile Manipulator , 2017, IEEE/ASME Transactions on Mechatronics.

[10]  Jannik Fritsch,et al.  A new performance measure and evaluation benchmark for road detection algorithms , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).

[11]  Irfan A. Essa,et al.  Geometric Context from Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[13]  Huijing Zhao,et al.  Multimodal information fusion for urban scene understanding , 2016, Machine Vision and Applications.

[14]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[15]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Guanglin Li,et al.  Development of Sensory-Motor Fusion-Based Manipulation and Grasping Control for a Robotic Hand-Eye System , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[17]  Feng Jiang,et al.  Point-to-Set Distance Metric Learning on Deep Representations for Visual Tracking , 2018, IEEE Transactions on Intelligent Transportation Systems.

[18]  Lorenzo Torresani,et al.  Deep End2End Voxel2Voxel Prediction , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Jinhui Tang,et al.  Weakly Supervised Deep Matrix Factorization for Social Image Understanding , 2017, IEEE Transactions on Image Processing.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[24]  Yoshua Bengio,et al.  The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Zhijun Li,et al.  Robust Tube-Based Predictive Control for Visual Servoing of Constrained Differential-Drive Mobile Robots , 2018, IEEE Transactions on Industrial Electronics.

[26]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[27]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28]  Jinhui Tang,et al.  Weakly Supervised Deep Metric Learning for Community-Contributed Image Retrieval , 2015, IEEE Transactions on Multimedia.