Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations

In this paper we propose a novel end-to-end learnable network that performs joint perception, prediction and motion planning for self-driving vehicles and produces interpretable intermediate representations. Unlike existing neural motion planners, our motion planning costs are consistent with our perception and prediction estimates. This is achieved by a novel differentiable semantic occupancy representation that is explicitly used as cost by the motion planning process. Our network is learned end-to-end from human demonstrations. The experiments in a large-scale manual-driving dataset and closed-loop simulation show that the proposed model significantly outperforms state-of-the-art planners in imitating the human behaviors while producing much safer trajectories.

[1]  Sergio Casas,et al.  The Importance of Prior Knowledge in Precise Multimodal Prediction , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Francesco Borrelli,et al.  Kinematic and dynamic vehicle models for autonomous driving control design , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[3]  Chunxiao Liu,et al.  Learning to Steer by Mimicking Features from Heterogeneous Auxiliary Networks , 2018, AAAI.

[4]  Alex Kendall,et al.  Urban Driving with Conditional Imitation Learning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[6]  Sergio Casas,et al.  PnPNet: End-to-End Perception and Prediction With Tracking in the Loop , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[8]  Bernard Ghanem,et al.  Driving Policy Transfer via Modularity and Abstraction , 2018, CoRL.

[9]  Sergio Casas,et al.  End-To-End Interpretable Neural Motion Planner , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Alberto Elfes,et al.  Using occupancy grids for mobile robot perception and navigation , 1989, Computer.

[11]  Changchun Liu,et al.  An Auto-tuning Framework for Autonomous Vehicles , 2018, ArXiv.

[12]  Raquel Urtasun,et al.  End-to-end Contextual Perception and Prediction with Interaction Transformer , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Raquel Urtasun,et al.  DeepSignals: Predicting Intent of Drivers Through Visual Signals , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[14]  Bin Yang,et al.  Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Yue Wang,et al.  Towards navigation without precise localization: Weakly supervised learning of goal-directed navigation cost map , 2019, ArXiv.

[16]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[17]  Benjamin Sapp,et al.  MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction , 2019, CoRL.

[18]  Renjie Liao,et al.  Discrete Residual Flow for Probabilistic Pedestrian Behavior Prediction , 2019, CoRL.

[19]  Denis Wolf,et al.  Scene Compliant Trajectory Forecast With Agent-Centric Spatio-Temporal Grids , 2019, IEEE Robotics and Automation Letters.

[20]  Mayank Bansal,et al.  ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.

[21]  Holger Banzhaf,et al.  Learning to Predict Ego-Vehicle Poses for Sampling-Based Nonholonomic Motion Planning , 2019, IEEE Robotics and Automation Letters.

[22]  Ruslan Salakhutdinov,et al.  Multiple Futures Prediction , 2019, NeurIPS.

[23]  Eder Santana,et al.  Exploring the Limitations of Behavior Cloning for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Sergey Levine,et al.  Deep Imitative Models for Flexible Inference, Planning, and Control , 2018, ICLR.

[25]  Vladlen Koltun,et al.  Learning by Cheating , 2019, CoRL.

[26]  Ersin Yumer,et al.  Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Guy Van den Broeck,et al.  LaTeS: Latent Space Distillation for Teacher-Student Driving Policy Learning , 2019, ArXiv.

[28]  David Janz,et al.  Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[29]  Markus Wulfmeier,et al.  Deep Inverse Reinforcement Learning , 2015, ArXiv.

[30]  Elena Corina Grigore,et al.  CoverNet: Multimodal Behavior Prediction Using Trajectory Sets , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Sergio Casas,et al.  IntentNet: Learning to Predict Intention from Raw Sensor Data , 2018, CoRL.

[32]  Andrew Zisserman,et al.  Detect to Track and Track to Detect , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Klaus Dietmayer,et al.  Dynamic Occupancy Grid Prediction for Urban Autonomous Driving: A Deep Learning Approach with Fully Automatic Labeling , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Sergio Casas,et al.  Implicit Latent Variable Model for Scene-Consistent Motion Forecasting , 2020, ECCV.

[35]  Bin Yang,et al.  Multi-Task Multi-Sensor Fusion for 3D Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Bin Yang,et al.  PIXOR: Real-time 3D Object Detection from Point Clouds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Vijay Kumar,et al.  Memory Augmented Control Networks , 2017, ICLR.

[38]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Sebastian Thrun,et al.  Learning Occupancy Grid Maps with Forward Sensor Models , 2003, Auton. Robots.

[40]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[41]  Alexander Hauptmann,et al.  The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Ying Nian Wu,et al.  Multi-Agent Tensor Fusion for Contextual Trajectory Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Raquel Urtasun,et al.  LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[45]  Julius Ziegler,et al.  Optimal trajectory generation for dynamic street scenarios in a Frenét Frame , 2010, 2010 IEEE International Conference on Robotics and Automation.

[46]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Ruslan Salakhutdinov,et al.  Neural Map: Structured Memory for Deep Reinforcement Learning , 2017, ICLR.

[48]  Renjie Liao,et al.  SpAGNN: Spatially-Aware Graph Neural Networks for Relational Behavior Forecasting from Sensor Data , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .