MP3: A Unified Model to Map, Perceive, Predict and Plan

High-definition maps (HD maps) are a key component of most modern self-driving systems due to their valuable semantic and geometric information. Unfortunately, building HD maps has proven hard to scale due to their cost as well as the requirements they impose in the localization system that has to work everywhere with centimeter-level accuracy. Being able to drive without an HD map would be very beneficial to scale self-driving solutions as well as to increase the failure tolerance of existing ones (e.g., if localization fails or the map is not up-to-date). Towards this goal, we propose MP3, an end-to-end approach to mapless1 driving where the input is raw sensor data and a high-level command (e.g., turn left at the intersection). MP3 predicts intermediate representations in the form of an online map and the current and future state of dynamic agents, and exploits them in a novel neural motion planner to make interpretable decisions taking into account uncertainty. We show that our approach is significantly safer, more comfortable, and can follow commands better than the baselines in challenging long-term closed-loop simulations, as well as when compared to an expert driver in a large-scale real-world dataset.

[1]  Yin Zhou,et al.  End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds , 2019, CoRL.

[2]  Bernard Ghanem,et al.  Driving Policy Transfer via Modularity and Abstraction , 2018, CoRL.

[3]  Eder Santana,et al.  Exploring the Limitations of Behavior Cloning for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Min Bai,et al.  Deep Multi-Sensor Lane Detection , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Sergio Casas,et al.  End-To-End Interpretable Neural Motion Planner , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Raquel Urtasun,et al.  Hierarchical Recurrent Attention Networks for Structured Online Maps , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Alex Kendall,et al.  Urban Driving with Conditional Imitation Learning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Renjie Liao,et al.  Discrete Residual Flow for Probabilistic Pedestrian Behavior Prediction , 2019, CoRL.

[9]  Raquel Urtasun,et al.  DAGMapper: Learning to Map by Discovering Lane Topology , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Ersin Yumer,et al.  Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  David J. DeWitt,et al.  RoadTracer: Automatic Extraction of Road Networks from Aerial Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Bin Yang,et al.  PIXOR: Real-time 3D Object Detection from Point Clouds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Sergio Casas,et al.  Implicit Latent Variable Model for Scene-Consistent Motion Forecasting , 2020, ECCV.

[14]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[16]  Michael Stolz,et al.  Search-Based Optimal Motion Planning for Automated Driving , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Sergio Casas,et al.  Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations , 2020, ECCV.

[18]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[19]  Raquel Urtasun,et al.  DeepRoadMapper: Extracting Road Topology from Aerial Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Paul Vernaza,et al.  r2p2: A ReparameteRized Pushforward Policy for Diverse, Precise Generative Path Forecasting , 2018, ECCV.

[21]  Renjie Liao,et al.  SpAGNN: Spatially-Aware Graph Neural Networks for Relational Behavior Forecasting from Sensor Data , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[23]  David Janz,et al.  Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[24]  Carlos Vallespi-Gonzalez,et al.  LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Bin Yang,et al.  Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Holger Banzhaf,et al.  Learning to Predict Ego-Vehicle Poses for Sampling-Based Nonholonomic Motion Planning , 2019, IEEE Robotics and Automation Letters.

[27]  R. Urtasun,et al.  PnPNet: End-to-End Perception and Prediction With Tracking in the Loop , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Tae Eun Choe,et al.  Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection , 2020, ECCV.

[29]  Brigitte d'Andréa-Novel,et al.  The kinematic bicycle model: A consistent model for planning feasible trajectories for autonomous vehicles? , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[30]  Sergey Levine,et al.  Deep Imitative Models for Flexible Inference, Planning, and Control , 2018, ICLR.

[31]  Yue Wang,et al.  Towards navigation without precise localization: Weakly supervised learning of goal-directed navigation cost map , 2019, ArXiv.

[32]  Sergio Casas,et al.  StrObe: Streaming Object Detection from LiDAR Packets , 2020, CoRL.

[33]  Ying Nian Wu,et al.  Multi-Agent Tensor Fusion for Contextual Trajectory Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Klaus Dietmayer,et al.  Dynamic Occupancy Grid Prediction for Urban Autonomous Driving: A Deep Learning Approach with Fully Automatic Labeling , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Sebastian Thrun,et al.  Learning Occupancy Grid Maps with Forward Sensor Models , 2003, Auton. Robots.

[36]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[37]  Helbing,et al.  Congested traffic states in empirical observations and microscopic simulations , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[38]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[39]  Alberto Elfes,et al.  Using occupancy grids for mobile robot perception and navigation , 1989, Computer.

[40]  Sergio Casas,et al.  IntentNet: Learning to Predict Intention from Raw Sensor Data , 2018, CoRL.

[41]  Changchun Liu,et al.  An Auto-tuning Framework for Autonomous Vehicles , 2018, ArXiv.

[42]  Elena Corina Grigore,et al.  CoverNet: Multimodal Behavior Prediction Using Trajectory Sets , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Julius Ziegler,et al.  Optimal trajectory generation for dynamic street scenarios in a Frenét Frame , 2010, 2010 IEEE International Conference on Robotics and Automation.

[44]  Sanja Fidler,et al.  Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D , 2020, ECCV.

[45]  Sergio Casas,et al.  The Importance of Prior Knowledge in Precise Multimodal Prediction , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[46]  Raquel Urtasun,et al.  End-to-end Contextual Perception and Prediction with Interaction Transformer , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[47]  Dimitris N. Metaxas,et al.  MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[49]  Alexander Hauptmann,et al.  The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Raquel Urtasun,et al.  LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Dan Levi,et al.  3D-LaneNet: End-to-End 3D Multiple Lane Detection , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Ruslan Salakhutdinov,et al.  Multiple Futures Prediction , 2019, NeurIPS.

[53]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[54]  Benjamin Sapp,et al.  MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction , 2019, CoRL.

[55]  Denis Wolf,et al.  Scene Compliant Trajectory Forecast With Agent-Centric Spatio-Temporal Grids , 2019, IEEE Robotics and Automation Letters.

[56]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[57]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).