Location-Velocity Attention for Pedestrian Trajectory Prediction

Pedestrian path forecasting is crucial in applications such as smart video surveillance. It is a challenging task because of the complex crowd movement patterns in the scenes. Most of existing state-of-the-art LSTM based prediction methods require rich context like labelled static obstacles, labelled entrance/exit regions and even the background scene. Furthermore, incorporating contextual information into trajectory prediction increases the computational overhead and decreases the generalization of the prediction models across different scenes. In this paper, we propose a joint Location-Velocity Attention LSTM based method to predict trajectories. Specifically, a module is designed to tweak the LSTM network and an attention mechanism is trained to learn to optimally combine the location and the velocity information of pedestrians in the prediction process. We have evaluated our approach against other baselines and state-of-the-art methods on several publicly available datasets. The results show that it not only outperforms other prediction methods but it also has a good generalization ability.

[1]  Luis E. Ortiz,et al.  Who are you with and where are you going? , 2011, CVPR 2011.

[2]  Chuan Wang,et al.  Look, Listen and Learn - A Multimodal LSTM for Speaker Identification , 2016, AAAI.

[3]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[4]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jun Zhu,et al.  Understanding Human Behaviors in Crowds by Imitating the Decision-Making Process , 2018, AAAI.

[6]  Mubarak Shah,et al.  Abnormal crowd behavior detection using social force model , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Zhi Yan,et al.  3DOF Pedestrian Trajectory Prediction Learned from Long-Term Autonomous Mobile Robot Deployment Data , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Mark Reynolds,et al.  SS-LSTM: A Hierarchical LSTM Model for Pedestrian Trajectory Prediction , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9]  Xiaogang Wang,et al.  Pedestrian Behavior Understanding and Prediction with Deep Neural Networks , 2016, ECCV.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Alberto Del Bimbo,et al.  Context-Aware Trajectory Prediction , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[12]  Shenghua Gao,et al.  Encoding Crowd Interaction with Deep Neural Network for Pedestrian Trajectory Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[14]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ioannis Karamouzas,et al.  Universal power law governing pedestrian interactions. , 2014, Physical review letters.

[16]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Song-Chun Zhu,et al.  Learning and Inferring “Dark Matter” and Predicting Human Intents and Trajectories in Videos , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Sridha Sridharan,et al.  Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection , 2017, Neural Networks.

[20]  Yuke Li,et al.  A Deep Spatiotemporal Perspective for Understanding Crowd Behavior , 2018, IEEE Transactions on Multimedia.

[21]  Sridha Sridharan,et al.  Tracking by Prediction: A Deep Generative Model for Mutli-person Localisation and Tracking , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[22]  Bo Zhang,et al.  Forecast the Plausible Paths in Crowd Scenes , 2017, IJCAI.

[23]  Yu Zhang,et al.  Training RNNs as Fast as CNNs , 2017, EMNLP 2018.

[24]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[25]  Xiaohui Xie,et al.  Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[26]  Dinesh Manocha,et al.  BRVO: Predicting pedestrian trajectories using velocity-space reasoning , 2015, Int. J. Robotics Res..

[27]  Mark Reynolds,et al.  Bi-Prediction: Pedestrian Trajectory Prediction Based on Bidirectional LSTM Classification , 2017, 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[28]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[29]  Xiaogang Wang,et al.  Understanding collective crowd behaviors: Learning a Mixture model of Dynamic pedestrian-Agents , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[31]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.

[33]  Kai Oliver Arras,et al.  People tracking with human motion predictions from social forces , 2010, 2010 IEEE International Conference on Robotics and Automation.

[34]  Qiang Liu,et al.  Reference Based LSTM for Image Captioning , 2017, AAAI.

[35]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[36]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[38]  Jean Oh,et al.  Modeling cooperative navigation in dense human crowds , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Alessio Del Bue,et al.  MX-LSTM: Mixing Tracklets and Vislets to Jointly Forecast Trajectories and Head Poses , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Shuai Li,et al.  Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[42]  Ian D. Reid,et al.  Stable multi-target tracking in real-time surveillance video , 2011, CVPR 2011.

[43]  Stefan Becker,et al.  An Evaluation of Trajectory Prediction Approaches and Notes on the TrajNet Benchmark , 2018, ArXiv.

[44]  Jen-Tzung Chien,et al.  Bayesian Recurrent Neural Network for Language Modeling , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[45]  Jean Oh,et al.  Social Attention: Modeling Attention in Human Crowds , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[46]  J. Ferryman,et al.  PETS2009: Dataset and challenge , 2009, 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance.

[47]  Eric Sommerlade,et al.  Modelling pedestrian trajectory patterns with Gaussian processes , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[48]  Yuke Li,et al.  Pedestrian Path Forecasting in Crowd: A Deep Spatio-Temporal Perspective , 2017, ACM Multimedia.

[49]  Gang Wang,et al.  Global Context-Aware Attention LSTM Networks for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.