Human trajectory prediction in crowded scene using social-affinity Long Short-Term Memory

Abstract Object tracking in crowded spaces is a challenging but very important task in computer vision applications. However, due to interactions among large-scale pedestrians and common social rules, predicting the complex human mobility in a crowded scene becomes difficult. This paper proposes a novel human trajectory prediction model in a crowded scene called the social-affinity LSTM model. Our model can learn general human mobility patterns and predict individual’ s trajectories based on their past positions, in particular, with the influence of their neighbors in the Social Affinity Map (SAM). The SAM clusters the relative positions of surrounding individuals, and represents the distribution of the relative positions by different bins with semantic descriptions. We formulate the problem of trajectory prediction together with interactions among people as a sequence generation task with social affinity. The proposed model utilizes the LSTM to learn general human moving patterns as well as the Social Affinity Map to connect neighbors with a weight matrix corresponding to SAM bins for learning the social dependencies between correlated pedestrians. By capturing the object’ s past positions and connecting the hidden states of it’ s neighbors in different SAM bins with different elements of the weight matrix, the social-affinity LSTM is able to predict the trajectory of each pedestrian with its own features and neighbors’ influence. We compare the performance of our method with the Social LSTM model on several public datasets. Our model outperforms state-of-the-art methods on these datasets with the best results, especially the datasets with more social affinity phenomena.

[1]  Kai Oliver Arras,et al.  People tracking with human motion predictions from social forces , 2010, 2010 IEEE International Conference on Robotics and Automation.

[2]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yihong Gong,et al.  Human Tracking Using Convolutional Neural Networks , 2010, IEEE Transactions on Neural Networks.

[5]  Ioannis A. Kakadiaris,et al.  Modeling local behavior for predicting social interactions towards human tracking , 2014, Pattern Recognit..

[6]  Jianbo Shi,et al.  Social saliency prediction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Zhen Qin,et al.  Improving multi-target tracking via social grouping , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Michel Bierlaire,et al.  Discrete Choice Models for Pedestrian Walking Behavior , 2006 .

[10]  Luc Van Gool,et al.  Improving Data Association by Joint Modeling of Pedestrian Trajectories and Groupings , 2010, ECCV.

[11]  Hongyu Guo,et al.  Long Short-Term Memory Over Recursive Structures , 2015, ICML.

[12]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Christian Laugier,et al.  Modelling Smooth Paths Using Gaussian Processes , 2007, FSR.

[14]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Andreas Krause,et al.  Robot navigation in dense human crowds: the case for cooperation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[18]  Silvio Savarese,et al.  A Unified Framework for Multi-target Tracking and Collective Activity Recognition , 2012, ECCV.

[19]  Imran N. Junejo,et al.  Social network model for crowd anomaly detection and localization , 2017, Pattern Recognit..

[20]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[21]  Xiaogang Wang,et al.  Understanding pedestrian behaviors from stationary crowd groups , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Siddhartha S. Srinivasa,et al.  Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[24]  Huchuan Lu,et al.  Deep visual tracking: Review and experimental comparison , 2018, Pattern Recognit..

[25]  Qingshan Liu,et al.  Visual tracking using spatio-temporally nonlocally regularized correlation filter , 2018, Pattern Recognit..

[26]  Albert-László Barabási,et al.  Limits of Predictability in Human Mobility , 2010, Science.

[27]  Takahiro Okabe,et al.  Fast unsupervised ego-action learning for first-person sports videos , 2011, CVPR 2011.

[28]  Bernard Ghanem,et al.  Context-Aware Correlation Filter Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Xiuwei Zhang,et al.  A novel multi-object detection method in complex scene using synthetic aperture imaging , 2012, Pattern Recognit..

[30]  Wangsheng Yu,et al.  Robust occlusion-aware part-based visual tracking with object scale adaptation , 2018, Pattern Recognit..

[31]  Jürgen Schmidhuber,et al.  Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation , 2015, NIPS.

[32]  Jian-Huang Lai,et al.  Detecting abnormal crowd behaviors based on the div-curl characteristics of flow fields , 2019, Pattern Recognit..

[33]  Juan José Pantrigo,et al.  Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition , 2018, Pattern Recognit..

[34]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.

[35]  Yun Fu,et al.  Prediction of Human Activity by Discovering Temporal Sequence Patterns , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Bodo Rosenhahn,et al.  Everybody needs somebody: Modeling social and grouping behavior on a linear programming multiple people tracker , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[37]  Bruce A. Draper,et al.  Visual object tracking using adaptive correlation filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Fei-Fei Li,et al.  Socially-Aware Large-Scale Crowd Forecasting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[41]  Anthony Hoogs,et al.  Unsupervised Learning of Functional Categories in Video Scenes , 2010, ECCV.

[42]  Adrien Treuille,et al.  Continuum crowds , 2006, SIGGRAPH 2006.

[43]  Le Zhang,et al.  Robust visual tracking via co-trained Kernelized correlation filters , 2017, Pattern Recognit..