Trajectory Prediction with Graph-based Dual-scale Context Fusion

Motion prediction for traffic participants is essential for a safe and robust automated driving system, especially in cluttered urban environments. However, it is highly challenging due to the complex road topology as well as the uncertain intentions of the other agents. In this paper, we present a graph-based trajectory prediction network named the Dual Scale Predictor (DSP), which encodes both the static and dynamical driving context in a hierarchical manner. Different from methods based on a rasterized map or sparse lane graph, we consider the driving context as a graph with two layers, focusing on both geometrical and topological features. Graph neural networks (GNNs) are applied to extract features with different levels of granularity, and features are subsequently aggregated with attention-based inter-layer networks, realizing better local-global feature fusion. Following the recent goal-driven trajectory prediction pipeline, goal candidates with high likelihood for the target agent are extracted, and predicted trajectories are generated conditioned on these goals. Thanks to the proposed dual-scale context fusion network, our DSP is able to generate accurate and human-like multi-modal trajectories. We evaluate the proposed method on the large-scale Argoverse motion forecasting benchmark, and it achieves promising results, outperforming the recent state-of-the-art methods. We release the code on our project website. 11https://github.com/HKUST-Aerial-Robotics/DSP

[1]  Fabien Moutarde,et al.  GOHOME: Graph-Oriented Heatmap Output for future Motion Estimation , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[2]  Lu Zhang,et al.  EPSILON: An Efficient Planning System for Automated Vehicles in Highly Interactive Environments , 2021, IEEE Transactions on Robotics.

[3]  Omar Y. Al-Jarrah,et al.  Deep Learning-based Vehicle Behaviour Prediction For Autonomous Driving Applications: A Review , 2019, ArXiv.

[4]  Hang Zhao,et al.  DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Fabien Moutarde,et al.  HOME: Heatmap Output for future Motion Estimation , 2021, 2021 IEEE International Intelligent Transportation Systems Conference (ITSC).

[6]  Jiquan Ngiam,et al.  Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Bolei Zhou,et al.  Multimodal Motion Prediction with Stacked Transformers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Qifeng Chen,et al.  Learning to Predict Vehicle Trajectories with Model-based Planning , 2021, CoRL.

[9]  Renjie Liao,et al.  LaneRCNN: Distributed Representations for Graph-Centric Motion Forecasting , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Micol Marchetti-Bowick,et al.  Map-Adaptive Goal-Based Trajectory Prediction , 2020, CoRL.

[11]  Yi Shen,et al.  TNT: Target-driveN Trajectory Prediction , 2020, CoRL.

[12]  R. Urtasun,et al.  Learning Lane Graph Representations for Motion Forecasting , 2020, ECCV.

[13]  Alan Yuille,et al.  Probabilistic Multi-modal Trajectory Prediction with Lane Attention for Autonomous Vehicles , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Dragomir Anguelov,et al.  VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  J. Malik,et al.  It Is Not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction , 2020, ECCV.

[16]  Freddy A. Boulton,et al.  CoverNet: Multimodal Behavior Prediction Using Trajectory Sets , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Dariu M. Gavrila,et al.  Human motion trajectory prediction: a survey , 2019, Int. J. Robotics Res..

[18]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Benjamin Sapp,et al.  MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction , 2019, CoRL.

[20]  Simon Lucey,et al.  Argoverse: 3D Tracking and Forecasting With Rich Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ying Nian Wu,et al.  Multi-Agent Tensor Fusion for Contextual Trajectory Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Henggang Cui,et al.  Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[23]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, International Journal of Computer Vision.

[24]  Mohan M. Trivedi,et al.  Convolutional Social Pooling for Vehicle Trajectory Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[27]  Jean Oh,et al.  Social Attention: Modeling Attention in Human Crowds , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[30]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[31]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[34]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[35]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[36]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[39]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Dizan Vasquez,et al.  A survey on motion prediction and risk assessment for intelligent vehicles , 2014, ROBOMECH Journal.

[42]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[43]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[44]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.