Fast Scene Understanding for Autonomous Driving

Most approaches for instance-aware semantic labeling traditionally focus on accuracy. Other aspects like runtime and memory footprint are arguably as important for real-time applications such as autonomous driving. Motivated by this observation and inspired by recent works that tackle multiple tasks with a single integrated architecture, in this paper we present a real-time efficient implementation based on ENet that solves three autonomous driving related tasks at once: semantic scene segmentation, instance segmentation and monocular depth estimation. Our approach builds upon a branched ENet architecture with a shared encoder but different decoder branches for each of the three tasks. The presented method can run at 21 fps at a resolution of 1024x512 on the Cityscapes dataset without sacrificing accuracy compared to running each task separately.

[1]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[2]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[3]  Xuming He,et al.  Boundary-Aware Instance Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Iasonas Kokkinos,et al.  UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Luc Van Gool,et al.  Semantic Instance Segmentation with a Discriminative Loss Function , 2017, ArXiv.

[6]  A. Owen A robust hybrid of lasso and ridge regression , 2006 .

[7]  Philip H. S. Torr,et al.  Pixelwise Instance Segmentation with a Dynamically Instantiated Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Thomas Brox,et al.  Pixel-Level Encoding and Depth Layering for Instance-Level Semantic Labeling , 2016, GCPR.

[11]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[12]  Min Bai,et al.  Deep Watershed Transform for Instance Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[14]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Xuming He,et al.  Shape-aware Instance Segmentation , 2016, ArXiv.

[16]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[17]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Carsten Rother,et al.  InstanceCut: From Edges to Instances with MultiCut , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[21]  Roberto Cipolla,et al.  MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving , 2016, 2018 IEEE Intelligent Vehicles Symposium (IV).

[22]  Sepp Hochreiter,et al.  Speeding up Semantic Segmentation for Autonomous Driving , 2016 .

[23]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.