A Location-Velocity-Temporal Attention LSTM Model for Pedestrian Trajectory Prediction

Pedestrian trajectory prediction is fundamental to a wide range of scientific research work and industrial applications. Most of the current advanced trajectory prediction methods incorporate context information such as pedestrian neighbourhood, labelled static obstacles, and the background scene into the trajectory prediction process. In contrast to these methods which require rich contexts, the method in our paper focuses on predicting a pedestrian’s future trajectory using his/her observed part of the trajectory only. Our method, which we refer to as LVTA, is a Location-Velocity-Temporal Attention LSTM model where two temporal attention mechanisms are applied to the hidden state vectors from the location and velocity LSTM layers. In addition, a location-velocity attention layer embedded inside a tweak module is used to improve the predicted location and velocity coordinates before they are passed to the next time step. Extensive experiments conducted on three large benchmark datasets and comparison with eleven existing trajectory prediction methods demonstrate that LVTA achieves competitive prediction performance. Specifically, LVTA attains 9.19 pixels Average Displacement Error (ADE) and 17.28 pixels Final Displacement Error (FDE) for the Central Station dataset, and 0.46 metres ADE and 0.92 metres FDE for the ETH&UCY datasets. Furthermore, evaluation on using LVTA to generate trajectories of different prediction lengths and on new scenes without the need of retraining confirms that it has good generalizability.

[1]  Brendan Tran Morris,et al.  Convolutional Neural Networkfor Trajectory Prediction , 2018, ECCV Workshops.

[2]  Gang Wang,et al.  Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks , 2017, IEEE Transactions on Image Processing.

[3]  Yuke Li,et al.  Pedestrian Path Forecasting in Crowd: A Deep Spatio-Temporal Perspective , 2017, ACM Multimedia.

[4]  Jun Zhu,et al.  Understanding Human Behaviors in Crowds by Imitating the Decision-Making Process , 2018, AAAI.

[5]  Gang Wang,et al.  Global Context-Aware Attention LSTM Networks for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Xiaogang Wang,et al.  Pedestrian Behavior Understanding and Prediction with Deep Neural Networks , 2016, ECCV.

[7]  Dinesh Manocha,et al.  Predicting Pedestrian Trajectories Using Velocity-Space Reasoning , 2012, WAFR.

[8]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[9]  Min Yang,et al.  Attention Based LSTM for Target Dependent Sentiment Classification , 2017, AAAI.

[10]  Shenghua Gao,et al.  Encoding Crowd Interaction with Deep Neural Network for Pedestrian Trajectory Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Tariq S. Durrani,et al.  Exploring Trajectory Prediction Through Machine Learning Methods , 2019, IEEE Access.

[12]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[13]  Xue Li,et al.  Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition , 2019, IEEE Transactions on Cybernetics.

[14]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Jean Oh,et al.  Social Attention: Modeling Attention in Human Crowds , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Silvio Savarese,et al.  SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Dinesh Manocha,et al.  BRVO: Predicting pedestrian trajectories using velocity-space reasoning , 2015, Int. J. Robotics Res..

[19]  Mark Reynolds,et al.  Bi-Prediction: Pedestrian Trajectory Prediction Based on Bidirectional LSTM Classification , 2017, 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[20]  Christopher Joseph Pal,et al.  Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Song-Chun Zhu,et al.  Learning and Inferring “Dark Matter” and Predicting Human Intents and Trajectories in Videos , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Roland Siegwart,et al.  Inferring Pedestrian Motions at Urban Crosswalks , 2019, IEEE Transactions on Intelligent Transportation Systems.

[23]  Silvio Savarese,et al.  Single-source Attention Path Prediction Multi-source Attention Predicted Observed , 2018 .

[24]  Xiaogang Wang,et al.  Understanding collective crowd behaviors: Learning a Mixture model of Dynamic pedestrian-Agents , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[26]  Vincent Aravantinos,et al.  The Simpler the Better: Constant Velocity for Pedestrian Motion Prediction , 2019, ArXiv.

[27]  Jiri Matas,et al.  DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Barbara Majecka,et al.  Statistical models of pedestrian behaviour in the Forum , 2009 .

[29]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[30]  Sridha Sridharan,et al.  Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection , 2017, Neural Networks.

[31]  Houshang Darabi,et al.  LSTM Fully Convolutional Networks for Time Series Classification , 2017, IEEE Access.

[32]  Luis E. Ortiz,et al.  Who are you with and where are you going? , 2011, CVPR 2011.

[33]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34]  Baoguo Li,et al.  SEABIG: A Deep Learning-Based Method for Location Prediction in Pedestrian Semantic Trajectories , 2019, IEEE Access.

[35]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[37]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[38]  Mohan M. Trivedi,et al.  Convolutional Social Pooling for Vehicle Trajectory Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[39]  Juan Carlos Niebles,et al.  Peeking Into the Future: Predicting Future Person Activities and Locations in Videos , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[41]  Xiaogang Wang,et al.  Diversity Regularized Spatiotemporal Attention for Video-Based Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.

[43]  Zhi Yan,et al.  3DOF Pedestrian Trajectory Prediction Learned from Long-Term Autonomous Mobile Robot Deployment Data , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Alberto Del Bimbo,et al.  Context-Aware Trajectory Prediction , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[46]  Mark Reynolds,et al.  Location-Velocity Attention for Pedestrian Trajectory Prediction , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[47]  Jean Oh,et al.  Modeling cooperative navigation in dense human crowds , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[48]  Alessio Del Bue,et al.  MX-LSTM: Mixing Tracklets and Vislets to Jointly Forecast Trajectories and Head Poses , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[50]  Siew Kei Lam,et al.  Situation-Aware Pedestrian Trajectory Prediction with Spatio-Temporal Attention Model , 2019, ArXiv.

[51]  Ioannis Karamouzas,et al.  Universal power law governing pedestrian interactions. , 2014, Physical review letters.

[52]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Mark Reynolds,et al.  SS-LSTM: A Hierarchical LSTM Model for Pedestrian Trajectory Prediction , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[54]  Saeid Nahavandi,et al.  Contextual Recurrent Predictive Model for Long-Term Intent Prediction of Vulnerable Road Users , 2020, IEEE Transactions on Intelligent Transportation Systems.

[55]  Nanning Zheng,et al.  SR-LSTM: State Refinement for LSTM Towards Pedestrian Trajectory Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).