Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset

As autonomous driving systems mature, motion forecasting has received increasing attention as a critical requirement for planning. Of particular importance are interactive situations such as merges, unprotected turns, etc., where predicting individual object motion is not sufficient. Joint predictions of multiple objects are required for effective route planning. There has been a critical need for highquality motion data that is rich in both interactions and annotation to develop motion planning models. In this work, we introduce the most diverse interactive motion dataset to our knowledge, and provide specific labels for interacting objects suitable for developing joint prediction models. With over 100,000 scenes, each 20 seconds long at 10 Hz, our new dataset contains more than 570 hours of unique data over 1750 km of roadways. It was collected by mining for interesting interactions between vehicles, pedestrians, and cyclists across six cities within the United States. We use a high-accuracy 3D auto-labeling system to generate high quality 3D bounding boxes for each road agent, and provide corresponding high definition 3D maps for each scene. Furthermore, we introduce a new set of metrics that provides a comprehensive evaluation of both single agent and joint agent interaction motion forecasting models. Finally, we provide strong baseline models for individualagent prediction and joint-prediction. We hope that this new large-scale interactive motion dataset will provide new opportunities for advancing motion forecasting models.

[1]  Yi Shen,et al.  TNT: Target-driveN Trajectory Prediction , 2020, CoRL.

[2]  Ruslan Salakhutdinov,et al.  Multiple Futures Prediction , 2019, NeurIPS.

[3]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Marco Pavone,et al.  Trajectron++: Dynamically-Feasible Trajectory Forecasting with Heterogeneous Data , 2020, ECCV.

[5]  Benjamin Sapp,et al.  MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction , 2019, CoRL.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Dragomir Anguelov,et al.  VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Renjie Liao,et al.  SpAGNN: Spatially-Aware Graph Neural Networks for Relational Behavior Forecasting from Sensor Data , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[9]  R. Urtasun,et al.  Learning Lane Graph Representations for Motion Forecasting , 2020, ECCV.

[10]  Yin Zhou,et al.  End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds , 2019, CoRL.

[11]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[13]  Simon Lucey,et al.  Argoverse: 3D Tracking and Forecasting With Rich Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Sergio Casas,et al.  Implicit Latent Variable Model for Scene-Consistent Motion Forecasting , 2020, ECCV.

[15]  Emilie Wirbel,et al.  PLOP: Probabilistic poLynomial Objects trajectory Planning for autonomous driving , 2020, ArXiv.

[16]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[18]  Henggang Cui,et al.  Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[19]  Raquel Urtasun,et al.  Auto4D: Learning to Label 4D Objects from Sequential Point Clouds , 2021, ArXiv.

[20]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.

[21]  Benjamin Coifman,et al.  A critical evaluation of the Next Generation Simulation (NGSIM) vehicle trajectory dataset , 2017 .

[22]  Benjamin Sapp,et al.  Rules of the Road: Predicting Driving Behavior With a Convolutional Model of Semantic Interactions , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Dragomir Anguelov,et al.  Offboard 3D Object Detection from Point Cloud Sequences , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[25]  Raquel Urtasun,et al.  TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[27]  Paul Vernaza,et al.  r2p2: A ReparameteRized Pushforward Policy for Diverse, Precise Generative Path Forecasting , 2018, ECCV.

[28]  Henggang Cui,et al.  Deep Kinematic Models for Kinematically Feasible Vehicle Trajectory Predictions , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Elena Corina Grigore,et al.  CoverNet: Multimodal Behavior Prediction Using Trajectory Sets , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Benjamin Sapp,et al.  Identifying Driver Interactions via Conditional Behavior Prediction , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Sammy Omari,et al.  One Thousand and One Hours: Self-driving Motion Prediction Dataset , 2020, CoRL.

[32]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Ian D. Reid,et al.  Stable multi-target tracking in real-time surveillance video , 2011, CVPR 2011.

[34]  Masayoshi Tomizuka,et al.  INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps , 2019, ArXiv.

[35]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[36]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Luca Anthony Thiede,et al.  Analyzing the Variety Loss in the Context of Probabilistic Trajectory Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[39]  Sergio Casas,et al.  IntentNet: Learning to Predict Intention from Raw Sensor Data , 2018, CoRL.

[40]  Sergey Levine,et al.  PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).