Embedding group and obstacle information in LSTM networks for human trajectory prediction in crowded scenes

Abstract Recurrent neural networks have shown good abilities in learning the spatio-temporal dependencies of moving agents in crowded scenes. Recently, they have been adopted to predict the motion of pedestrians by learning the relative motion of each individual in the crowd with respect to its neighbours. Crowded scenes present a wide variety of situations, which do not depend solely on the agents’ positions, but also relate to the structure of the environment, the density of the crowd, and the social relationships between pedestrians. In this work we propose a framework to improve the state-of-the-art models of crowd motion prediction by enriching the learning model with the social relationships between pedestrians walking in the crowd, as well as the layout of the environment. We observe that socially-related people tend to exhibit coherent motion patterns. Exploiting the motion coherency, we are able to cluster trajectories with similar motion properties and improve the trajectory prediction, especially at the group level. Furthermore, we incorporate into the model also the layout of the environment, to guarantee a more realistic and reliable learning framework. We evaluate our approach on standard crowd benchmark datasets, demonstrating its efficacy and applicability, improving the accuracy in trajectory prediction.

[1]  Xiaogang Wang,et al.  Understanding pedestrian behaviors from stationary crowd groups , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Bingbing Ni,et al.  Crowded Scene Analysis: A Survey , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Dinesh Manocha,et al.  Directing Crowd Simulations Using Navigation Fields , 2011, IEEE Transactions on Visualization and Computer Graphics.

[4]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[5]  Jake K. Aggarwal,et al.  Stochastic Representation and Recognition of High-Level Group Activities , 2011, International Journal of Computer Vision.

[6]  Nicola Conci,et al.  On Modeling and Analyzing Crowds From Videos , 2018 .

[7]  Yanning Zhang,et al.  Human trajectory prediction in crowded scene using social-affinity Long Short-Term Memory , 2019, Pattern Recognit..

[8]  Silvio Savarese,et al.  Social Scene Understanding: End-to-End Multi-person Action Localization and Collective Activity Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Andrea Cavallaro,et al.  Support Vector Motion Clustering , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Patrick J. Flynn,et al.  Crowd Scene Understanding from Video , 2017, ACM Trans. Multim. Comput. Commun. Appl..

[11]  Robert T. Collins,et al.  Vision-Based Analysis of Small Groups in Pedestrian Crowds , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Xiaogang Wang,et al.  Learning Scene-Independent Group Descriptors for Crowd Understanding , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Siew Kei Lam,et al.  Situation-Aware Pedestrian Trajectory Prediction with Spatio-Temporal Attention Model , 2019, ArXiv.

[14]  Dinesh Manocha,et al.  Reciprocal n-Body Collision Avoidance , 2011, ISRR.

[15]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  E. Hall,et al.  Proxemics [and Comments and Replies] , 1968, Current Anthropology.

[17]  Dinesh Manocha,et al.  DensePeds: Pedestrian Tracking in Dense Crowds Using Front-RVO and Sparse Features , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  Bo Zhang,et al.  Data-Driven crowd simulation , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[19]  Silvio Savarese,et al.  Knowledge Transfer for Scene-Specific Motion Prediction , 2016, ECCV.

[20]  G. Srinivasaraghavan,et al.  Human Trajectory Prediction using Spatially aware Deep Attention Models , 2017, ArXiv.

[21]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.

[22]  Norman I. Badler,et al.  Controlling individual agents in high-density crowd simulation , 2007, SCA '07.

[23]  Chee Seng Chan,et al.  Crowd behavior analysis: A review where physics meets biology , 2015, Neurocomputing.

[24]  Fei-Fei Li,et al.  Socially-Aware Large-Scale Crowd Forecasting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Jean Oh,et al.  Social Attention: Modeling Attention in Human Crowds , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Adrien Treuille,et al.  Continuum crowds , 2006, SIGGRAPH 2006.

[28]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.