SR-LSTM: State Refinement for LSTM Towards Pedestrian Trajectory Prediction

In crowd scenarios, reliable trajectory prediction of pedestrians requires insightful understanding of their social behaviors. These behaviors have been well investigated by plenty of studies, while it is hard to be fully expressed by hand-craft rules. Recent studies based on LSTM networks have shown great ability to learn social behaviors. However, many of these methods rely on previous neighboring hidden states but ignore the important current intention of the neighbors. In order to address this issue, we propose a data-driven state refinement module for LSTM network (SR-LSTM), which activates the utilization of the current intention of neighbors, and jointly and iteratively refines the current states of all participants in the crowd through a message passing mechanism. To effectively extract the social effect of neighbors, we further introduce a social-aware information selection mechanism consisting of an element-wise motion gate and a pedestrian-wise attention to select useful message from neighboring pedestrians. Experimental results on two public datasets, i.e. ETH and UCY, demonstrate the effectiveness of our proposed SR-LSTM and we achieve state-of-the-art results.

[1]  Xiaogang Wang,et al.  Understanding pedestrian behaviors from stationary crowd groups , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Shuicheng Yan,et al.  Semantic Object Parsing with Graph LSTM , 2016, ECCV.

[3]  Silvio Savarese,et al.  SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Xiaogang Wang,et al.  Profiling stationary crowd groups , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[5]  Shenghua Gao,et al.  Encoding Crowd Interaction with Deep Neural Network for Pedestrian Trajectory Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Gonzalo Ferrer,et al.  Robot companion: A social-force based approach with human awareness-navigation in crowded environments , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Jean Oh,et al.  Social Attention: Modeling Attention in Human Crowds , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[9]  S Bouzat,et al.  Game theory in models of pedestrian room evacuation. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  G. Srinivasaraghavan,et al.  Human Trajectory Prediction using Spatially aware Deep Attention Models , 2017, ArXiv.

[11]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.

[12]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[13]  Tatsuo Arai,et al.  Human-Robot Collision Avoidance using a modified Social force Model with Body Pose and Face Orientation , 2013, Int. J. Humanoid Robotics.

[14]  S. Bandini,et al.  Age and Group-driven Pedestrian Behaviour: from Observations to Simulations , 2016 .

[15]  Xiaoping Zheng,et al.  Conflict game in evacuation process: A study combining Cellular Automata model , 2011 .

[16]  Bo Zhang,et al.  Crowd Scene Understanding with Coherent Recurrent Neural Networks , 2016, IJCAI.

[17]  Baoxin Li,et al.  Hierarchical Attention Network for Action Recognition in Videos , 2016, ArXiv.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Yichen Wei,et al.  Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Xiaogang Wang,et al.  ViP-CNN: Visual Phrase Guided Convolutional Neural Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[22]  Majid Sarvi,et al.  Group and Single Pedestrian Behavior in Crowd Dynamics , 2016 .

[23]  Stefan Lee,et al.  Graph R-CNN for Scene Graph Generation , 2018, ECCV.

[24]  Serge P. Hoogendoorn,et al.  Simulation of pedestrian flows by optimal control and differential games , 2003 .

[25]  Shuicheng Yan,et al.  Interpretable Structure-Evolving LSTM , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[27]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Wolfgang Hübner,et al.  Particle-based Pedestrian Path Prediction using LSTM-MDL Models , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[29]  Bo Zhang,et al.  Forecast the Plausible Paths in Crowd Scenes , 2017, IJCAI.

[30]  Alessio Del Bue,et al.  MX-LSTM: Mixing Tracklets and Vislets to Jointly Forecast Trajectories and Head Poses , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Ramin Mehran,et al.  Abnormal crowd behavior detection using social force model , 2009, CVPR.

[32]  Lei Shi,et al.  Adaptive Spectral Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, ArXiv.

[33]  Danfei Xu,et al.  Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Dirk Helbing,et al.  Game Theoretical Interactions of Moving Agents , 2009, Simulating Complex Systems by Cellular Automata.

[35]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[36]  Stefan Becker,et al.  RED: A Simple but Effective Baseline Predictor for the TrajNet Benchmark , 2018, ECCV Workshops.

[37]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[38]  Sridha Sridharan,et al.  Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection , 2017, Neural Networks.

[39]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Stefania Bandini,et al.  Empirical Investigation on Pedestrian Crowd Dynamics and Grouping , 2015 .

[41]  Serge P. Hoogendoorn,et al.  Modeling Human Behavior in Vessel Maneuver Simulation by Optimal Control and Game Theory , 2013 .

[42]  Abhinav Gupta,et al.  Temporal Dynamic Graph LSTM for Action-Driven Video Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Luis E. Ortiz,et al.  Who are you with and where are you going? , 2011, CVPR 2011.

[44]  Qingming Huang,et al.  Abnormal crowd behavior detection based on social attribute-aware force model , 2012, 2012 19th IEEE International Conference on Image Processing.

[45]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Abhinav Gupta,et al.  Videos as Space-Time Region Graphs , 2018, ECCV.

[47]  Lubos Buzna,et al.  Self-Organized Pedestrian Crowd Dynamics: Experiments, Simulations, and Design Solutions , 2005, Transp. Sci..

[48]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[49]  D. Helbing,et al.  The Walking Behaviour of Pedestrian Social Groups and Its Impact on Crowd Dynamics , 2010, PloS one.

[50]  Jian Dong,et al.  Attentive Contexts for Object Detection , 2016, IEEE Transactions on Multimedia.

[51]  Xiaogang Wang,et al.  Scene Graph Generation from Objects, Phrases and Region Captions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[52]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[53]  Kris M. Kitani,et al.  Forecasting Interactive Dynamics of Pedestrians with Fictitious Play , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).