Interpretable End-to-End Driving Model for Implicit Scene Understanding

Driving scene understanding is to obtain comprehensive scene information through the sensor data and provide a basis for downstream tasks, which is indispensable for the safety of self-driving vehicles. Specific perception tasks, such as object detection and scene graph generation, are commonly used. However, the results of these tasks are only equivalent to the characterization of sampling from high-dimensional scene features, which are not sufficient to represent the scenario. In addition, the goal of perception tasks is inconsistent with human driving that just focuses on what may affect the ego-trajectory. Therefore, we propose an end-to-end Interpretable Implicit Driving Scene Understanding (II-DSU) model to extract implicit high-dimensional scene features as scene understanding results guided by a planning module and to validate the plausibility of scene understanding using auxiliary perception tasks for visualization. Experimental results on CARLA benchmarks show that our approach achieves the new state-of-the-art and is able to obtain scene features that embody richer scene information relevant to driving, enabling superior performance of the downstream planning.

[1]  Hongsheng Li,et al.  Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer , 2022, Conference on Robot Learning.

[2]  Wayne Zhang,et al.  Panoptic Scene Graph Generation , 2022, ECCV.

[3]  Junchi Yan,et al.  Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline , 2022, NeurIPS.

[4]  Andreas Geiger,et al.  TransFuser: Imitation With Transformer-Based Sensor Fusion for Autonomous Driving , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Philipp Krähenbühl,et al.  Learning from All Vehicles , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Trevor Darrell,et al.  A ConvNet for the 2020s , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Deepan Muthirayan,et al.  Spatiotemporal Scene-Graph Embedding for Autonomous Vehicle Collision Prediction , 2021, IEEE Internet of Things Journal.

[8]  Andreas Geiger,et al.  NEAT: Neural Attention Fields for End-to-End Autonomous Driving , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Ruifeng Li,et al.  RSG-Net: Towards Rich Sematic Relationship Prediction for Intelligent Vehicle in Complex Environments , 2021, 2021 IEEE Intelligent Vehicles Symposium (IV).

[10]  Philipp Krähenbühl,et al.  Learning to drive from a world on rails , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Andreas Geiger,et al.  Multi-Modal Fusion Transformer for End-to-End Autonomous Driving , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Matthew A. Wright,et al.  Pylot: A Modular Platform for Exploring Latency-Accuracy Tradeoffs in Autonomous Vehicles , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Raquel Urtasun,et al.  MP3: A Unified Model to Map, Perceive, Predict and Plan , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  R. Urtasun,et al.  Perceive, Attend, and Drive: Learning Spatial Attention for Safe Self-Driving , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Dragomir Anguelov,et al.  VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Vladlen Koltun,et al.  Learning by Cheating , 2019, CoRL.

[17]  F. Moutarde,et al.  End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  W. Zuo,et al.  ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Sergio Casas,et al.  End-To-End Interpretable Neural Motion Planner , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Sergey Levine,et al.  PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Eder Santana,et al.  Exploring the Limitations of Behavior Cloning for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[23]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Weijian Li,et al.  Attentive Relational Networks for Mapping Images to Scene Graphs , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  V. Koltun,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[26]  Bo Dai,et al.  Detecting Visual Relationships with Deep Relational Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.