Goal-driven Self-Attentive Recurrent Networks for Trajectory Prediction

Human trajectory forecasting is a key component of autonomous vehicles, social-aware robots and advanced video-surveillance applications. This challenging task typically requires knowledge about past motion, the environment and likely destination areas. In this context, multi-modality is a fundamental aspect and its effective modeling can be beneficial to any architecture. Inferring accurate trajectories is nevertheless challenging, due to the inherently uncertain nature of the future. To overcome these difficulties, recent models use different inputs and propose to model human intentions using complex fusion mechanisms. In this respect, we propose a lightweight attention-based recurrent backbone that acts solely on past observed positions. Although this backbone already provides promising results, we demonstrate that its prediction accuracy can be improved considerably when combined with a scene-aware goal-estimation module. To this end, we employ a common goal module, based on a U-Net architecture, which additionally extracts semantic information to predict scene-compliant destinations. We conduct extensive experiments on publicly-available datasets (i.e. SDD, inD, ETH/UCY) and show that our approach performs on par with state-of-the-art techniques while reducing model complexity.

[1]  Richard P. Wildes,et al.  Where are you heading? Dynamic Trajectory Prediction with Expert Goal Examples , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Laura Leal-Taixe,et al.  MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Kris Kitani,et al.  AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Alexandre Alahi,et al.  Social NCE: Contrastive Learning of Socially-aware Motion Representations , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Yang An,et al.  From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Laura Leal-Taixé,et al.  Goal-GAN: Multimodal Trajectory Prediction Based on Goal Position Estimation , 2020, ACCV.

[7]  Yi Shen,et al.  TNT: Target-driveN Trajectory Prediction , 2020, CoRL.

[8]  Alexandre Alahi,et al.  Human Trajectory Forecasting in Crowds: A Deep Learning Perspective , 2020, IEEE Transactions on Intelligent Transportation Systems.

[9]  A. Bimbo,et al.  MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  N. Higham,et al.  Accurately computing the log-sum-exp and softmax functions , 2020, IMA Journal of Numerical Analysis.

[11]  Shuai Yi,et al.  Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction , 2020, ECCV.

[12]  Lamberto Ballan,et al.  AC-VRNN: Attentive Conditional-VRNN for Multi-Future Trajectory Prediction , 2020, Comput. Vis. Image Underst..

[13]  J. Malik,et al.  It Is Not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction , 2020, ECCV.

[14]  Marco Cristani,et al.  Transformer Networks for Trajectory Forecasting , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[15]  Abduallah A. Mohamed,et al.  Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Stefano V. Albrecht,et al.  Interpretable Goal-based Prediction and Planning for Autonomous Driving , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Marco Pavone,et al.  Trajectron++: Multi-Agent Generative Trajectory Forecasting With Heterogeneous Data for Control , 2020, ArXiv.

[18]  K. Murphy,et al.  The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Lutz Eckstein,et al.  The inD Dataset: A Drone Dataset of Naturalistic Road User Trajectories at German Intersections , 2019, 2020 IEEE Intelligent Vehicles Symposium (IV).

[20]  Zhaoxin Li,et al.  STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Lamberto Ballan,et al.  Social and Scene-Aware Trajectory Prediction in Crowded Spaces , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[22]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[23]  Behzad Dariush,et al.  Looking to Relations for Future Trajectory Forecast , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Dariu M. Gavrila,et al.  Human motion trajectory prediction: a survey , 2019, Int. J. Robotics Res..

[25]  Vincent Aravantinos,et al.  What the Constant Velocity Model Can Teach Us About Pedestrian Motion Prediction , 2019, IEEE Robotics and Automation Letters.

[26]  Nanning Zheng,et al.  SR-LSTM: State Refinement for LSTM Towards Pedestrian Trajectory Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Sergey Levine,et al.  Learning Actionable Representations with Goal-Conditioned Policies , 2018, ICLR.

[28]  Silvio Savarese,et al.  SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Mark Reynolds,et al.  SS-LSTM: A Hierarchical LSTM Model for Pedestrian Trajectory Prediction , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[31]  S. Savarese,et al.  CAR-Net: Clairvoyant Attentive Recurrent Network , 2017, ECCV.

[32]  Jean Oh,et al.  Social Attention: Modeling Attention in Human Crowds , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[34]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[35]  S. Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[36]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[37]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Silvio Savarese,et al.  Knowledge Transfer for Scene-Specific Motion Prediction , 2016, ECCV.

[39]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[40]  Siddhartha S. Srinivasa,et al.  Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[41]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[42]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.

[43]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[44]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[45]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[46]  Silvio Savarese,et al.  Long-term path prediction in urban scenarios using circular distributions , 2018, Image Vis. Comput..

[47]  Kris M. Kitani,et al.  Research Showcase @ CMU , 2010 .