A Survey on Deep Learning Based Approaches for Scene Understanding in Autonomous Driving

As a prerequisite for autonomous driving, scene understanding has attracted extensive research. With the rise of the convolutional neural network (CNN)-based deep learning technique, research on scene understanding has achieved significant progress. This paper aims to provide a comprehensive survey of deep learning-based approaches for scene understanding in autonomous driving. We categorize these works into four work streams, including object detection, full scene semantic segmentation, instance segmentation, and lane line segmentation. We discuss and analyze these works according to their characteristics, advantages and disadvantages, and basic frameworks. We also summarize the benchmark datasets and evaluation criteria used in the research community and make a performance comparison of some of the latest works. Lastly, we summarize the review work and provide a discussion on the future challenges of the research domain.

[1]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[2]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[3]  Paulo Peixoto,et al.  Multimodal vehicle detection: fusing 3D-LIDAR and color camera data , 2017, Pattern Recognit. Lett..

[4]  Yang Yang,et al.  GC-YOLOv3: You Only Look Once with Global Context Block , 2020, Electronics.

[5]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[8]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[9]  Huanyu Wang,et al.  Ultra Fast Structure-aware Deep Lane Detection , 2020, ECCV.

[10]  Hyongsuk Kim,et al.  CED-Net: Crops and Weeds Segmentation for Smart Farming Using a Small Cascaded Encoder-Decoder Architecture , 2020, Electronics.

[11]  Lennart Svensson,et al.  LIDAR-Camera Fusion for Road Detection Using Fully Convolutional Neural Networks , 2018, Robotics Auton. Syst..

[12]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Joan Serrat,et al.  Robust lane markings detection and road geometry computation , 2010 .

[15]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[16]  Maria Riveiro,et al.  Anomaly Detection for Road Traffic: A Visual Analytics Framework , 2017, IEEE Transactions on Intelligent Transportation Systems.

[17]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[18]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jiyoung Jung,et al.  Real-Time Road Lane Detection in Urban Areas Using LiDAR Data , 2018, Electronics.

[20]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[21]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[23]  Soumith Chintala,et al.  A MultiPath Network for Object Detection , 2016, BMVC.

[24]  Sergiu Nedevschi,et al.  Probabilistic Lane Tracking in Difficult Road Scenarios Using Stereovision , 2009, IEEE Transactions on Intelligent Transportation Systems.

[25]  Tania Stathaki,et al.  Salient Object Detection Combining a Self-attention Module and a Feature Pyramid Network , 2020, Electronics.

[26]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[27]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[28]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[29]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[30]  Sorin Grigorescu,et al.  A Survey of Deep Learning Techniques for Autonomous Driving , 2020, J. Field Robotics.

[31]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Andreas Geiger,et al.  Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art , 2017, Found. Trends Comput. Graph. Vis..

[33]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[34]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[35]  Yizhou Yu,et al.  FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation , 2019, ArXiv.

[36]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[37]  Xin He,et al.  Scene Text Detection and Recognition: The Deep Learning Era , 2018, International Journal of Computer Vision.

[38]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[39]  Saeid Nahavandi,et al.  Intent Prediction of Pedestrians via Motion Trajectories Using Stacked Recurrent Neural Networks , 2018, IEEE Transactions on Intelligent Vehicles.