HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding

One essential task for autonomous driving is to encode the information of a driving scene into vector representations so that the downstream task such as trajectory prediction could perform well. The driving scene is complicated, and there exists heterogeneity within elements, where they own diverse types of information i.e., agent dynamics, map routing, road lines, etc. Meanwhile, there also exist relativity across elements - meaning they have spatial relations with each other; such relations should be canonically represented regarding the relative measurements since the absolute value of the coordinate is meaningless. Taking these two observations into consideration, we propose a novel backbone, namely Heterogeneous Driving Graph Transformer (HDGT), which models the driving scene as a heterogeneous graph with different types of nodes and edges. For graph construction, each node represents either an agent or a road element and each edge represents their semantics relations such as Pedestrian-To-Crosswalk, Lane-To-Left-Lane. As for spatial relation encoding, instead of setting a fixed global reference, the coordinate information of the node as well as its in-edges is transformed to the local node-centric coordinate system. For the aggregation module in the graph neural network (GNN), we adopt the transformer structure in a hierarchical way to fit the heterogeneous nature of inputs. Experimental results show that the proposed method achieves new state-of-the-art on INTERACTION Prediction Challenge and Waymo Open Motion Challenge, in which we rank 1st and 2nd respectively regarding the minADE/minFDE metric.

[1]  S. Konev,et al.  MotionCNN: A Strong Baseline for Motion Prediction in Autonomous Driving , 2022, ArXiv.

[2]  Qifeng Chen,et al.  Bootstrap Motion Forecasting With Self-Consistent Constraints , 2022, IEEE International Conference on Computer Vision.

[3]  Benjamin Sapp,et al.  MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[4]  Fabien Moutarde,et al.  THOMAS: Trajectory Heatmap Output with learned Multi-Agent Sampling , 2021, ICLR.

[5]  Chen Lv,et al.  Multi-modal Motion Prediction with Transformer-based Neural Network for Autonomous Driving , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[6]  Fabien Moutarde,et al.  GOHOME: Graph-Oriented Heatmap Output for future Motion Estimation , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[7]  Alexander Kolesnikov,et al.  Scaling Vision Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Hang Zhao,et al.  DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Eric M. Wolff,et al.  Multimodal Trajectory Prediction Conditioned on Lane-Graph Traversals , 2021, CoRL.

[10]  Chen Lv,et al.  Heterogeneous Edge-Enhanced Graph Attention Network For Multi-Agent Trajectory Prediction , 2021, ArXiv.

[11]  Jiquan Ngiam,et al.  Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Qifeng Chen,et al.  TPCN: Temporal Point Cloud Networks for Motion Forecasting , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Philip S. Yu,et al.  Knowledge-Preserving Incremental Social Event Detection via Heterogeneous GNNs , 2021, WWW.

[14]  Renjie Liao,et al.  LaneRCNN: Distributed Representations for Graph-Centric Motion Forecasting , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Masayoshi Tomizuka,et al.  IDE-Net: Interactive Driving Event and Pattern Extraction From Human Data , 2020, IEEE Robotics and Automation Letters.

[16]  Chiho Choi,et al.  Shared Cross-Modal Trajectory Prediction for Autonomous Driving , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Wei Zhan,et al.  Multi-Agent Trajectory Prediction by Combining Egocentric and Allocentric Views , 2021, CoRL.

[18]  Jiquan Ngiam,et al.  Scene Transformer: A unified multi-task model for behavior prediction and planning , 2021, ArXiv.

[19]  Chen Lv,et al.  ReCoG: A Deep Learning Framework with Heterogeneous Graph for Interaction-Aware Trajectory Prediction , 2020, ArXiv.

[20]  Boris Yangel,et al.  PRANK: motion Prediction based on RANKing , 2020, NeurIPS.

[21]  Yi Shen,et al.  TNT: Target-driveN Trajectory Prediction , 2020, CoRL.

[22]  R. Urtasun,et al.  Learning Lane Graph Representations for Motion Forecasting , 2020, ECCV.

[23]  Sergio Casas,et al.  Implicit Latent Variable Model for Scene-Consistent Motion Forecasting , 2020, ECCV.

[24]  A. Bimbo,et al.  MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[26]  Dragomir Anguelov,et al.  VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yizhou Sun,et al.  Heterogeneous Graph Transformer , 2020, WWW.

[28]  Abduallah A. Mohamed,et al.  Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Marco Pavone,et al.  Trajectron++: Multi-Agent Generative Trajectory Forecasting With Heterogeneous Data for Control , 2020, ArXiv.

[30]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ruslan Salakhutdinov,et al.  Multiple Futures Prediction , 2019, NeurIPS.

[32]  Benjamin Sapp,et al.  MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction , 2019, CoRL.

[33]  Masayoshi Tomizuka,et al.  INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps , 2019, ArXiv.

[34]  Nitesh V. Chawla,et al.  Heterogeneous Graph Neural Network , 2019, KDD.

[35]  Simon Lucey,et al.  Argoverse: 3D Tracking and Forecasting With Rich Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Benjamin Sapp,et al.  Rules of the Road: Predicting Driving Behavior With a Convolutional Model of Semantic Interactions , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ying Nian Wu,et al.  Multi-Agent Tensor Fusion for Contextual Trajectory Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Mayank Bansal,et al.  ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.

[39]  Marco Pavone,et al.  The Trajectron: Probabilistic Multi-Agent Trajectory Modeling With Dynamic Spatiotemporal Graphs , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Henggang Cui,et al.  Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[41]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Marco Pavone,et al.  Generative Modeling of Multimodal Multi-Human Behavior , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[43]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[44]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[45]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[48]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[50]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.