DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving

End-to-end autonomous driving aims to build a fully differentiable system that takes raw sensor data as inputs and directly outputs the planned trajectory or control signals of the ego vehicle. State-of-the-art methods usually follow the `Teacher-Student' paradigm. The Teacher model uses privileged information (ground-truth states of surrounding agents and map elements) to learn the driving strategy. The student model only has access to raw sensor data and conducts behavior cloning on the data collected by the teacher model. By eliminating the noise of the perception part during planning learning, state-of-the-art works could achieve better performance with significantly less data compared to those coupled ones. However, under the current Teacher-Student paradigm, the student model still needs to learn a planning head from scratch, which could be challenging due to the redundant and noisy nature of raw sensor inputs and the casual confusion issue of behavior cloning. In this work, we aim to explore the possibility of directly adopting the strong teacher model to conduct planning while letting the student model focus more on the perception part. We find that even equipped with a SOTA perception model, directly letting the student model learn the required inputs of the teacher model leads to poor driving performance, which comes from the large distribution gap between predicted privileged inputs and the ground-truth. To this end, we propose DriveAdapter, which employs adapters with the feature alignment objective function between the student (perception) and teacher (planning) modules. Additionally, since the pure learning-based teacher model itself is imperfect and occasionally breaks safety rules, we propose a method of action-guided feature learning with a mask for those imperfect teacher features to further inject the priors of hand-crafted rules into the learning process.

[1]  Andreas Geiger,et al.  End-to-end Autonomous Driving: Challenges and Frontiers , 2023, ArXiv.

[2]  Eshed Ohn-Bar,et al.  Coaching a Teachable Student , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Andreas Geiger,et al.  Hidden Biases of End-to-End Driving Models , 2023, arXiv.org.

[4]  Junchi Yan,et al.  Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Junchi Yan,et al.  Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling , 2023, ICLR.

[6]  Jifeng Dai,et al.  Planning-oriented Autonomous Driving , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Andreas Geiger,et al.  PlanT: Explainable Planning Transformers via Object-Level Representations , 2022, CoRL.

[8]  J. Shotton,et al.  Model-Based Imitation Learning for Urban Driving , 2022, NeurIPS.

[9]  Hongsheng Li,et al.  Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer , 2022, Conference on Robot Learning.

[10]  Yang Gao,et al.  Resolving Copycat Problems in Visual Imitation Learning via Residual Action Prediction , 2022, ECCV.

[11]  Lujia Wang,et al.  MMFN: Multi-Modal-Fusion-Net for End-to-End Driving , 2022, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Fabien Bonardi,et al.  IBISCape: A Simulated Benchmark for multi-modal SLAM Systems Evaluation in Large-scale Dynamic Environments , 2022, Journal of Intelligent & Robotic Systems.

[13]  Yang Gao,et al.  Fighting Fire with Fire: Avoiding DNN Shortcuts through Priming , 2022, ICML.

[14]  Zeming Li,et al.  BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection , 2022, AAAI.

[15]  Junchi Yan,et al.  Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline , 2022, NeurIPS.

[16]  Andreas Geiger,et al.  TransFuser: Imitation With Transformer-Based Sensor Fusion for Autonomous Driving , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Huizi Mao,et al.  BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Junchi Yan,et al.  HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jifeng Dai,et al.  BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers , 2022, ECCV.

[20]  Philipp Krähenbühl,et al.  Learning from All Vehicles , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  A. Schwing,et al.  Masked-attention Mask Transformer for Universal Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Peng Gao,et al.  Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling , 2021, ArXiv.

[23]  Peng Gao,et al.  CLIP-Adapter: Better Vision-Language Models with Feature Adapters , 2021, Int. J. Comput. Vis..

[24]  Andreas Geiger,et al.  NEAT: Neural Attention Fields for End-to-End Autonomous Driving , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Luc Van Gool,et al.  End-to-End Urban Driving by Imitating a Reinforcement Learning Coach , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[27]  Philipp Krähenbühl,et al.  Learning to drive from a world on rails , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Andreas Geiger,et al.  Multi-Modal Fusion Transformer for End-to-End Autonomous Driving , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Masayoshi Tomizuka,et al.  IDE-Net: Interactive Driving Event and Pattern Extraction From Human Data , 2020, IEEE Robotics and Automation Letters.

[30]  Trevor Darrell,et al.  Fighting Copycat Agents in Behavioral Cloning from Observation Histories , 2020, NeurIPS.

[31]  Sanja Fidler,et al.  Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D , 2020, ECCV.

[32]  Eshed Ohn-Bar,et al.  Supplementary Material for Exploring Data Aggregation in Policy Learning for Vision-based Urban Autonomous Driving , 2020 .

[33]  Vladlen Koltun,et al.  Learning by Cheating , 2019, CoRL.

[34]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[35]  F. Moutarde,et al.  End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Alex H. Lang,et al.  PointPainting: Sequential Fusion for 3D Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Bolei Zhou,et al.  Cross-View Semantic Segmentation for Sensing Surroundings , 2019, IEEE Robotics and Automation Letters.

[38]  Eder Santana,et al.  Exploring the Limitations of Behavior Cloning for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Mona Attariyan,et al.  Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[40]  Bo Li,et al.  SECOND: Sparsely Embedded Convolutional Detection , 2018, Sensors.

[41]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[43]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[44]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[45]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[48]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[49]  Yann LeCun,et al.  Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[50]  A. E. Conrady Decentred Lens-Systems , 1919 .

[51]  Junchi Yan,et al.  Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine Approach , 2022, CoRL.

[52]  Wei Zhan,et al.  Multi-Agent Trajectory Prediction by Combining Egocentric and Allocentric Views , 2021, CoRL.

[53]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[54]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .