Action-Reaction: Forecasting the Dynamics of Human Interaction

Forecasting human activities from visual evidence is an emerging area of research which aims to allow computational systems to make predictions about unseen human actions. We explore the task of activity forecasting in the context of dual-agent interactions to understand how the actions of one person can be used to predict the actions of another. We model dual-agent interactions as an optimal control problem, where the actions of the initiating agent induce a cost topology over the space of reactive poses – a space in which the reactive agent plans an optimal pose trajectory. The technique developed in this work employs a kernel-based reinforcement learning approximation of the soft maximum value function to deal with the high-dimensional nature of human motion and applies a mean-shift procedure over a continuous cost function to infer a smooth reaction sequence. Experimental results show that our proposed method is able to properly model human interactions in a high dimensional space of human poses. When compared to several baseline models, results show that our method is able to generate highly plausible simulations of human interaction.

[1]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  Sergey Levine,et al.  Continuous character control with low-dimensional embeddings , 2012, ACM Trans. Graph..

[5]  Siddhartha S. Srinivasa,et al.  Legibility and predictability of robot motion , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[6]  Tsuhan Chen,et al.  Spatio-Temporal Phrases for Activity Recognition , 2012, ECCV.

[7]  Fernando De la Torre,et al.  Max-margin early event detectors , 2012, CVPR.

[8]  Michael S. Ryoo,et al.  Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[9]  Jessica K. Hodgins,et al.  Interactive control of avatars animated with human motion data , 2002, SIGGRAPH.

[10]  Zoran Popovic,et al.  Motion fields for interactive character locomotion , 2010, CACM.

[11]  Jake K. Aggarwal,et al.  An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010 , 2010, ICPR Contests.

[12]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[13]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Hema Swetha Koppula,et al.  Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[16]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[17]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Martial Hebert,et al.  Patch to the Future: Unsupervised Visual Prediction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Fernando De la Torre,et al.  Max-Margin Early Event Detectors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Amit K. Roy-Chowdhury,et al.  A “string of feature graphs” model for recognition of complex activities in natural videos , 2011, 2011 International Conference on Computer Vision.

[21]  Sven J. Dickinson,et al.  Recognize Human Activities from Partially Observed Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Jessica K. Hodgins,et al.  Construction and optimal search of interpolated motion graphs , 2007, ACM Trans. Graph..

[23]  Lucas Kovar,et al.  Motion graphs , 2002, SIGGRAPH '08.

[24]  Liming Xiang,et al.  Kernel-Based Reinforcement Learning , 2006, ICIC.

[25]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[26]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[27]  Matthieu Guillaumin,et al.  Segmentation Propagation in ImageNet , 2012, ECCV.

[28]  Nancy S. Pollard,et al.  To appear in the ACM SIGGRAPH conference proceedings Responsive Characters from Motion Fragments , 2022 .

[29]  R. Bellman A Markovian Decision Process , 1957 .