Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer

Large-scale deployment of autonomous vehicles has been continually delayed due to safety concerns. On the one hand, comprehensive scene understanding is indispensable, a lack of which would result in vulnerability to rare but complex traffic situations, such as the sudden emergence of unknown objects. However, reasoning from a global context requires access to sensors of multiple types and adequate fusion of multi-modal sensor signals, which is difficult to achieve. On the other hand, the lack of interpretability in learning models also hampers the safety with unverifiable failure causes. In this paper, we propose a safety-enhanced autonomous driving framework, named Interpretable Sensor Fusion Transformer(InterFuser), to fully process and fuse information from multi-modal multi-view sensors for achieving comprehensive scene understanding and adversarial event detection. Besides, intermediate interpretable features are generated from our framework, which provide more semantics and are exploited to better constrain actions to be within the safe sets. We conducted extensive experiments on CARLA benchmarks, where our model outperforms prior methods, ranking the first on the public CARLA Leaderboard. Our code will be made available at https://github.com/opendilab/InterFuser

[1]  Junchi Yan,et al.  Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline , 2022, NeurIPS.

[2]  Andreas Geiger,et al.  TransFuser: Imitation With Transformer-Based Sensor Fusion for Autonomous Driving , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Junchi Yan,et al.  HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Philipp Krähenbühl,et al.  Learning from All Vehicles , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  M. Tomizuka,et al.  Transferable and Adaptable Driving Behavior Prediction , 2022, ArXiv.

[6]  Changliu Liu,et al.  Online Adaptation of Neural Network Models by Modified Extended Kalman Filter for Customizable and Transferable Driving Behavior Prediction , 2021, ArXiv.

[7]  F. Moutarde,et al.  GRI: General Reinforced Imitation and its Application to Vision-Based Autonomous Driving , 2021, Robotics.

[8]  Wei Zhan,et al.  Hierarchical Adaptable and Transferable Networks (HATN) for Driving Behavior Prediction , 2021, ArXiv.

[9]  Jiaya Jia,et al.  Blending Anti-Aliasing into Vision Transformer , 2021, NeurIPS.

[10]  Andreas Geiger,et al.  NEAT: Neural Attention Fields for End-to-End Autonomous Driving , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Luc Van Gool,et al.  End-to-End Urban Driving by Imitating a Reinforcement Learning Coach , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Calin Belta,et al.  The Reasonable Crowd: Towards evidence-based and interpretable models of driving behavior , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Angela Dai,et al.  TransformerFusion: Monocular RGB Scene Reconstruction using Transformers , 2021, NeurIPS.

[14]  Philipp Krähenbühl,et al.  Learning to drive from a world on rails , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Andreas Geiger,et al.  Multi-Modal Fusion Transformer for End-to-End Autonomous Driving , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ying Nian Wu,et al.  Trajectory Prediction with Latent Belief Energy-Based Model , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[18]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[19]  M. Tomizuka,et al.  Socially-Compatible Behavior Design of Autonomous Vehicles With Verification on Real Human Data , 2020, IEEE Robotics and Automation Letters.

[20]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[21]  Marco Pavone,et al.  On infusing reachability-based safety assurance within planning frameworks for human–robot vehicle interactions , 2020, Int. J. Robotics Res..

[22]  Yi Shen,et al.  TNT: Target-driveN Trajectory Prediction , 2020, CoRL.

[23]  Mark Chen,et al.  Generative Pretraining From Pixels , 2020, ICML.

[24]  Sergey Levine,et al.  Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts? , 2020, ICML.

[25]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[26]  M. Tomizuka,et al.  Interpretable End-to-End Urban Autonomous Driving With Latent Deep Reinforcement Learning , 2020, IEEE Transactions on Intelligent Transportation Systems.

[27]  Vladlen Koltun,et al.  Learning by Cheating , 2019, CoRL.

[28]  Guy Van den Broeck,et al.  SAM: Squeeze-and-Mimic Networks for Conditional Visual Driving Policy Learning , 2019, CoRL.

[29]  F. Moutarde,et al.  End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Sergio Casas,et al.  End-To-End Interpretable Neural Motion Planner , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Sergey Levine,et al.  PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Eder Santana,et al.  Exploring the Limitations of Behavior Cloning for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[34]  Ian C. Ballard,et al.  Holistic Reinforcement Learning: The Role of Structure and Attention , 2019, Trends in Cognitive Sciences.

[35]  Mykel J. Kochenderfer,et al.  Algorithms for Verifying Deep Neural Networks , 2019, Found. Trends Optim..

[36]  Philipp Krähenbühl,et al.  Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Mahmoud Saeed,et al.  End-To-End Multi-Modal Sensors Fusion System For Urban Automated Driving , 2018 .

[38]  Bin Yang,et al.  Deep Continuous Fusion for Multi-sensor 3D Object Detection , 2018, ECCV.

[39]  Anca D. Dragan,et al.  Probabilistically Safe Robot Planning with Confidence-Based Human Predictions , 2018, Robotics: Science and Systems.

[40]  Wei Zhan,et al.  Probabilistic Prediction of Vehicle Semantic Intention and Motion , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[41]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[42]  V. Koltun,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[43]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[44]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[45]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[46]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[47]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Masayoshi Tomizuka,et al.  CONTROL IN A SAFE SET: ADDRESSING SAFETY IN HUMAN-ROBOT INTERACTIONS , 2014, HRI 2014.

[50]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[51]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[53]  Junchi Yan,et al.  Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine Approach , 2022, CoRL.

[54]  Minxue Pan,et al.  Testing DNN-based Autonomous Driving Systems under Critical Environmental Conditions , 2021, ICML.

[55]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[56]  Eshed Ohn-Bar,et al.  Supplementary Material for Exploring Data Aggregation in Policy Learning for Vision-based Urban Autonomous Driving , 2020 .