Argoverse: 3D Tracking and Forecasting With Rich Maps

We present Argoverse, a dataset designed to support autonomous vehicle perception tasks including 3D tracking and motion forecasting. Argoverse includes sensor data collected by a fleet of autonomous vehicles in Pittsburgh and Miami as well as 3D tracking annotations, 300k extracted interesting vehicle trajectories, and rich semantic maps. The sensor data consists of 360 degree images from 7 cameras with overlapping fields of view, forward-facing stereo imagery, 3D point clouds from long range LiDAR, and 6-DOF pose. Our 290km of mapped lanes contain rich geometric and semantic metadata which are not currently available in any public dataset. All data is released under a Creative Commons license at Argoverse.org. In baseline experiments, we use map information such as lane direction, driveable area, and ground height to improve the accuracy of 3D object tracking. We use 3D object tracking to mine for more than 300k interesting vehicle trajectories to create a trajectory forecasting benchmark. Motion forecasting experiments ranging in complexity from classical methods (k-NN) to LSTMs demonstrate that using detailed vector maps with lane-level information substantially reduces prediction error. Our tracking and forecasting experiments represent only a superficial exploration of the potential of rich maps in robotic perception. We hope that Argoverse will enable the research community to explore these problems in greater depth.

[1]  H.F. Durrant-Whyte,et al.  A new approach for filtering nonlinear systems , 1995, Proceedings of 1995 American Control Conference - ACC'95.

[2]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3]  Hugh F. Durrant-Whyte,et al.  A new method for the nonlinear transformation of means and covariances in filters and estimators , 2000, IEEE Trans. Autom. Control..

[4]  William Whittaker,et al.  Tartan Racing: A multi-modal approach to the DARPA Urban Challenge , 2007 .

[5]  Michael Himmelsbach,et al.  LIDAR-based 3D Object Perception , 2008 .

[6]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[7]  Luke Fletcher,et al.  A perception-driven autonomous urban vehicle , 2008 .

[8]  Peter King,et al.  Odin: Team VictorTango's entry in the DARPA Urban Challenge , 2008, J. Field Robotics.

[9]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Sebastian Thrun,et al.  Junior: The Stanford entry in the Urban Challenge , 2008, J. Field Robotics.

[11]  Sebastian Thrun,et al.  Towards fully autonomous driving: Systems and algorithms , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[12]  Ryan M. Eustice,et al.  Ford Campus vision and lidar data set , 2011, Int. J. Robotics Res..

[13]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[14]  John P. Snyder,et al.  Map Projections: A Working Manual , 2012 .

[15]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[16]  Sebastian Thrun,et al.  Precision tracking with sparse 3D and dense color 2D data , 2013, 2013 IEEE International Conference on Robotics and Automation.

[17]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[18]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[19]  Silvio Savarese,et al.  Combining 3D Shape, Color, and Motion for Robust Anytime Tracking , 2014, Robotics: Science and Systems.

[20]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Silvio Savarese,et al.  A Probabilistic Framework for Real-time 3D Segmentation using Spatial, Temporal, and Semantic Cues , 2016, Robotics: Science and Systems.

[22]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[23]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  James M. Ferryman,et al.  PETS 2016: Dataset and Challenge , 2016, CVPR Workshops.

[25]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Mohan M. Trivedi,et al.  A Multimodal, Full-Surround Vehicular Testbed for Naturalistic Studies and Benchmarking: Design, Calibration and Deployment , 2017, ArXiv.

[27]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[28]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Vladlen Koltun,et al.  Playing for Benchmarks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Thierry Chateau,et al.  Deep MANTA: A Coarse-to-Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis from Monocular Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Min Bai,et al.  TorontoCity: Seeing the World with a Million Eyes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[33]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  James M. Rehg,et al.  3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Peter Ondruska,et al.  Visual Vehicle Tracking Through Noise and Occlusions Using Crowd-Sourced Maps , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Sergio Casas,et al.  IntentNet: Learning to Predict Intention from Raw Sensor Data , 2018, CoRL.

[38]  Raquel Urtasun,et al.  Hierarchical Recurrent Attention Networks for Structured Online Maps , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Min Bai,et al.  Deep Multi-Sensor Lane Detection , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40]  Mohan M. Trivedi,et al.  Convolutional Social Pooling for Vehicle Trajectory Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[41]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Luc Van Gool,et al.  End-to-End Learning of Driving Models with Surround-View Cameras and Route Planners , 2018, ECCV.

[43]  Bin Yang,et al.  Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Ruigang Yang,et al.  The ApolloScape Dataset for Autonomous Driving , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[45]  Raquel Urtasun,et al.  End-to-End Deep Structured Models for Drawing Crosswalks , 2018, ECCV.

[46]  Paul Vernaza,et al.  r2p2: A ReparameteRized Pushforward Policy for Diverse, Precise Generative Path Forecasting , 2018, ECCV.

[47]  Ulrich Neumann,et al.  SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Bin Yang,et al.  HDNET: Exploiting HD Maps for 3D Object Detection , 2018, CoRL.

[49]  Ruigang Yang,et al.  ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Yi-Ting Chen,et al.  The H3D Dataset for Full-Surround 3D Multi-Object Detection and Tracking in Crowded Urban Scenes , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[52]  Bin Yang,et al.  Multi-Task Multi-Sensor Fusion for 3D Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Sergey Levine,et al.  PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[54]  Raquel Urtasun,et al.  Convolutional Recurrent Network for Road Boundary Extraction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Leonidas J. Guibas,et al.  FlowNet3D: Learning Scene Flow in 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Dinesh Manocha,et al.  TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents , 2018, AAAI.

[57]  Alexander G. Schwing,et al.  Diverse Generation for Multi-Agent Sports Games , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Mohak Shah,et al.  Is it Safe to Drive? An Overview of Factors, Challenges, and Datasets for Driveability Assessment in Autonomous Driving , 2018, ArXiv.

[59]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).