Forecasting Interactive Dynamics of Pedestrians with Fictitious Play

We develop predictive models of pedestrian dynamics by encoding the coupled nature of multi-pedestrian interaction using game theory and deep learning-based visual analysis to estimate person-specific behavior parameters. We focus on predictive models since they are important for developing interactive autonomous systems (e.g., autonomous cars, home robots, smart homes) that can understand different human behavior and pre-emptively respond to future human actions. Building predictive models for multi-pedestrian interactions however, is very challenging due to two reasons: (1) the dynamics of interaction are complex interdependent processes, where the decision of one person can affect others, and (2) dynamics are variable, where each person may behave differently (e.g., an older person may walk slowly while the younger person may walk faster). To address these challenges, we utilize concepts from game theory to model the intertwined decision making process of multiple pedestrians and use visual classifiers to learn a mapping from pedestrian appearance to behavior parameters. We evaluate our proposed model on several public multiple pedestrian interaction video datasets. Results show that our strategic planning model predicts and explains human interactions 25% better when compared to a state-of-the-art activity forecasting method.

[1]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[2]  E. Hall,et al.  The Hidden Dimension , 1970 .

[3]  M. Rabin Published by: American , 2022 .

[4]  P. Molnár Social Force Model for Pedestrian Dynamics Typeset Using Revt E X 1 , 1995 .

[5]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[6]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.

[7]  Mubarak Shah,et al.  A Lagrangian Particle Dynamics Approach for Crowd Flow Segmentation and Stability Analysis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[9]  Mubarak Shah,et al.  Floor Fields for Tracking in High Density Crowd Scenes , 2008, ECCV.

[10]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Ramin Mehran,et al.  Abnormal crowd behavior detection using social force model , 2009, CVPR.

[12]  Marshall F. Tappen,et al.  Learning pedestrian dynamics from the real world , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Tim Roughgarden,et al.  Algorithmic Game Theory , 2007 .

[14]  Irfan A. Essa,et al.  Motion fields to predict play evolution in dynamic sport scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Gita Reese Sukthankar,et al.  Leveraging human behavior models to predict paths in indoor environments , 2011, Pervasive Mob. Comput..

[16]  Takahiro Okabe,et al.  Appearance-based head pose estimation with scene-specific adaptation , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[17]  Ian D. Reid,et al.  Stable multi-target tracking in real-time surveillance video , 2011, CVPR 2011.

[18]  M. Dufwenberg Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[19]  Jianbo Shi,et al.  Multi-hypothesis motion planning for visual object tracking , 2011, 2011 International Conference on Computer Vision.

[20]  Luis E. Ortiz,et al.  Who are you with and where are you going? , 2011, CVPR 2011.

[21]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[22]  Shuicheng Yan,et al.  Visual Classification With Multitask Joint Sparse Representation , 2012, IEEE Transactions on Image Processing.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Yaser Sheikh,et al.  3D Social Saliency from Head-mounted Cameras , 2012, NIPS.

[25]  Steven J. Brams Game Theory and Politics , 2013, Dover Books on Science.

[26]  Takahiro Okabe,et al.  Head direction estimation from low resolution images with scene adaptation , 2013, Comput. Vis. Image Underst..

[27]  Song-Chun Zhu,et al.  Inferring "Dark Matter" and "Dark Energy" from Videos , 2013, 2013 IEEE International Conference on Computer Vision.

[28]  Manuel G. Penedo,et al.  Unsupervised Trajectory Modelling Using Temporal Information via Minimal Paths , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Fei-Fei Li,et al.  Socially-Aware Large-Scale Crowd Forecasting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Martial Hebert,et al.  Patch to the Future: Unsupervised Visual Prediction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Wolfram Burgard,et al.  Learning to predict trajectories of cooperatively navigating agents , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Kris M. Kitani,et al.  Action-Reaction: Forecasting the Dynamics of Human Interaction , 2014, ECCV.

[33]  Martial Hebert,et al.  Dense Optical Flow Prediction from a Static Image , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Antonio Torralba,et al.  Anticipating the future by watching unlabeled video , 2015, ArXiv.

[35]  Hema Swetha Koppula,et al.  Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Xiaogang Wang,et al.  Pedestrian detection aided by deep learning semantic tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  J. Andrew Bagnell,et al.  Approximate MaxEnt Inverse Optimal Control and Its Application for Mental Simulation of Human Interactions , 2015, AAAI.

[38]  Jake K. Aggarwal,et al.  Robot-Centric Activity Prediction from First-Person Videos: What Will They Do to Me? , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[39]  Dinesh Manocha,et al.  BRVO: Predicting pedestrian trajectories using velocity-space reasoning , 2015, Int. J. Robotics Res..

[40]  Ashutosh Saxena,et al.  rCRF: Recursive Belief Estimation over CRFs in RGB-D Activity Videos , 2015, Robotics: Science and Systems.

[41]  Xiaogang Wang,et al.  Understanding pedestrian behaviors from stationary crowd groups , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Ali Farhadi,et al.  Generating Notifications for Missing Actions: Don't Forget to Turn the Lights Off! , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  Kris M. Kitani,et al.  Predicting wide receiver trajectories in American football , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[44]  Nicholas Rhinehart,et al.  Online Semantic Activity Forecasting with DARKO , 2016, ArXiv.

[45]  Kris M. Kitani,et al.  Long-Term Activity Forecasting Using First-Person Vision , 2016, ACCV.

[46]  Stefano Soatto,et al.  Intent-aware long-term prediction of pedestrian motion , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Jianbo Shi,et al.  Egocentric Future Localization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Hema Swetha Koppula,et al.  Recurrent Neural Networks for driver activity anticipation via sensory-fusion architecture , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Song-Chun Zhu,et al.  Modeling and Inferring Human Intents and Latent Functional Objects for Trajectory Prediction , 2016, ArXiv.

[50]  Vittorio Murino,et al.  Intention from Motion , 2016, ArXiv.

[51]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).