Restricted Deformable Convolution-Based Road Scene Semantic Segmentation Using Surround View Cameras

Understanding the surrounding environment of the vehicle is still one of the challenges for autonomous driving. This paper addresses 360-degree road scene semantic segmentation using surround view cameras, which are widely equipped in existing production cars. First, in order to address large distortion problem in the fisheye images, Restricted Deformable Convolution (RDC) is proposed for semantic segmentation, which can effectively model geometric transformations by learning the shapes of convolutional filters conditioned on the input feature map. Second, in order to obtain a large-scale training set of surround view images, a novel method called zoom augmentation is proposed to transform conventional images to fisheye images. Finally, an RDC based semantic segmentation model is built; the model is trained for real-world surround view images through a multi-task learning architecture by combining real-world images with transformed images. Experiments demonstrate the effectiveness of the RDC to handle images with large distortions, and that the proposed approach shows a good performance using surround view cameras with the help of the transformed images.

[1]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[2]  Lars Petersson,et al.  DecomposeMe: Simplifying ConvNets for End-to-End Learning , 2016, ArXiv.

[3]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Luis Miguel Bergasa,et al.  CNN-based Fisheye Image Real-Time Semantic Segmentation , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[5]  Antonio Manuel López Peña,et al.  Procedural Generation of Videos to Train Deep Action Recognition Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[9]  Jun Li,et al.  Dense Transformer Networks , 2017, ArXiv.

[10]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11]  Jongwoo Lim,et al.  ROVO: Robust Omnidirectional Visual Odometry for Wide-baseline Wide-FOV Camera Systems , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[12]  Marc Pollefeys,et al.  Semantic Stixels: Depth is not enough , 2016, 2016 IEEE Intelligent Vehicles Symposium (IV).

[13]  Kenro Miyamoto,et al.  Fish Eye Lens , 1964 .

[14]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Martin Jägersand,et al.  Deep semantic segmentation for automated driving: Taxonomy, roadmap and challenges , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[16]  Uwe Franke,et al.  Low-level fusion of color, texture and depth for robust road scene understanding , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[17]  Jiaying Liu,et al.  Revisiting Batch Normalization For Practical Domain Adaptation , 2016, ICLR.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Lars Petersson,et al.  Effective Use of Synthetic Data for Urban Scene Semantic Segmentation , 2018, ECCV.

[20]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Ming Yang,et al.  Automatic Parking Based on a Bird's Eye View Vision System , 2014 .

[22]  Lars Petersson,et al.  Incorporating Network Built-in Priors in Weakly-Supervised Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  C. V. Jawahar,et al.  Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[25]  Torsten Sattler,et al.  VSO: Visual Semantic Odometry , 2018, ECCV.

[26]  Garrison W. Cottrell,et al.  Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[27]  Vincent Frémont,et al.  Vision-Based People Detection System for Heavy Machine Applications , 2016, Sensors.

[28]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[30]  Ming Yang,et al.  CNN based semantic segmentation for urban traffic scenes using fisheye camera , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[31]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Ming Yang,et al.  Stixel World Based Long-Term Object Tracking for Intelligent Driving , 2016, ICCSIP.

[33]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[34]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35]  Philip H. S. Torr,et al.  Combining Appearance and Structure from Motion Features for Road Scene Understanding , 2009, BMVC.

[36]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Juho Kannala,et al.  A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Yann LeCun,et al.  Road Scene Segmentation from a Single Image , 2012, ECCV.

[39]  Sepp Hochreiter,et al.  Speeding up Semantic Segmentation for Autonomous Driving , 2016 .

[40]  Jiaying Liu,et al.  Adaptive Batch Normalization for practical domain adaptation , 2018, Pattern Recognit..

[41]  Arthur Daniel Costea,et al.  Super-sensor for 360-degree environment perception: Point cloud segmentation using image features , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[42]  Eduardo Romera,et al.  ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[43]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.