论文信息 - End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

End-to-end approaches to autonomous driving commonly rely on expert demonstrations. Although humans are good drivers, they are not good coaches for end-to-end algorithms that demand dense on-policy supervision. On the contrary, automated experts that leverage privileged information can efficiently generate large scale on-policy and off-policy demonstrations. However, existing automated experts for urban driving make heavy use of hand-crafted rules and perform suboptimally even on driving simulators, where ground-truth information is available. To address these issues, we train a reinforcement learning expert that maps bird’s-eye view images to continuous low-level actions. While setting a new performance upper-bound on CARLA, our expert is also a better coach that provides informative supervision signals for imitation learning agents to learn from. Supervised by our reinforcement learning coach, a baseline end-to-end agent with monocular camerainput achieves expert-level performance. Our end-to-end agent achieves a 78% success rate while generalizing to a new town and new weather on the NoCrash-dense benchmark and state-of-the-art performance on the more challenging CARLA LeaderBoard.

[1] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.

[2] Sammy Omari,et al. One Thousand and One Hours: Self-driving Motion Prediction Dataset , 2020, CoRL.

[3] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[4] William Whittaker,et al. Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[5] Germán Ros,et al. CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[6] Eshed Ohn-Bar,et al. Label Efficient Visual Abstractions for Autonomous Driving , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7] Sergio Casas,et al. End-To-End Interpretable Neural Motion Planner , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[9] Masayoshi Tomizuka,et al. Model-free Deep Reinforcement Learning for Urban Autonomous Driving , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[10] Luc Van Gool,et al. Learning Accurate and Human-Like Driving using Semantic Maps and Attention , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11] Mayank Bansal,et al. ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.

[12] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[13] Alexey Dosovitskiy,et al. End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[14] Bin Yang,et al. Multi-Task Multi-Sensor Fusion for 3D Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Andreas Geiger,et al. Conditional Affordance Learning for Driving in Urban Environments , 2018, CoRL.

[16] Eshed Ohn-Bar,et al. Supplementary Material for Exploring Data Aggregation in Policy Learning for Vision-based Urban Autonomous Driving , 2020 .

[17] David Janz,et al. Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[18] Guy Van den Broeck,et al. SAM: Squeeze-and-Mimic Networks for Conditional Visual Driving Policy Learning , 2019, CoRL.

[19] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[20] Bernard Ghanem,et al. Driving Policy Transfer via Modularity and Abstraction , 2018, CoRL.

[21] Eder Santana,et al. Exploring the Limitations of Behavior Cloning for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[23] Raquel Urtasun,et al. TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Jianxiong Xiao,et al. DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25] Luc Van Gool,et al. End-to-End Learning of Driving Models with Surround-View Cameras and Route Planners , 2018, ECCV.

[26] Eshed Ohn-Bar,et al. Learning Situational Driving , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Byron Boots,et al. Agile Autonomous Driving using End-to-End Deep Imitation Learning , 2017, Robotics: Science and Systems.

[28] Eric P. Xing,et al. CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving , 2018, ECCV.

[29] Yang Gao,et al. End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Qiang Xu,et al. nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Sebastian Thrun,et al. Junior: The Stanford entry in the Urban Challenge , 2008, J. Field Robotics.

[32] Fabien Moutarde,et al. Is Deep Reinforcement Learning Really Superhuman on Atari? , 2019, NeurIPS 2019.

[33] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[34] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[35] Vladlen Koltun,et al. Learning by Cheating , 2019, CoRL.

[36] Pieter Abbeel,et al. LaND: Learning to Navigate From Disengagements , 2020, IEEE Robotics and Automation Letters.

[37] Daniela Rus,et al. Learning Robust Control Policies for End-to-End Autonomous Driving From Data-Driven Simulation , 2020, IEEE Robotics and Automation Letters.

[38] Chunxiao Liu,et al. Learning to Steer by Mimicking Features from Heterogeneous Auxiliary Networks , 2018, AAAI.

[39] Trevor Darrell,et al. Monocular Plan View Networks for Autonomous Driving , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[41] Trevor Darrell,et al. BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[42] Dragomir Anguelov,et al. Offboard 3D Object Detection from Point Cloud Sequences , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[44] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[45] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[46] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[47] Marin Toromanoff,et al. End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[49] Praveen Palanisamy,et al. Multi-Agent Connected Autonomous Driving using Deep Reinforcement Learning , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[50] Sergey Levine,et al. Deep Imitative Models for Flexible Inference, Planning, and Control , 2018, ICLR.

[51] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.