Inferring human intent from video by sampling hierarchical plans

This paper presents a method which allows robots to infer a human's hierarchical intent from partially observed RGBD videos by imagining how the human will behave in the future. This capability is critical for creating robots which can interact socially or collaboratively with humans. We represent intent as a novel hierarchical, compositional, and probabilistic And-Or graph structure which describes a relationship between actions and plans. We infer human intent by reverse-engineering a human's decision-making and action planning processes under a Bayesian probabilistic programming framework. We present experiments from a 3D environment which demonstrate that the inferred human intent (1) matches well with human judgment, and (2) provides useful contextual cues for object tracking and action recognition.

[1]  Siddhartha S. Srinivasa,et al.  Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Henry A. Kautz,et al.  Learning and inferring transportation routines , 2004, Artif. Intell..

[3]  T. K. Vintsyuk Speech discrimination by dynamic programming , 1968 .

[4]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[5]  Chris L. Baker,et al.  Action understanding as inverse planning , 2009, Cognition.

[6]  James A. Hendler,et al.  HTN Planning: Complexity and Expressivity , 1994, AAAI.

[7]  Henry A. Kautz,et al.  Generalized Plan Recognition , 1986, AAAI.

[8]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[9]  Nanning Zheng,et al.  Concurrent Action Detection with Structural Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[11]  Brian Charles Williams,et al.  Intent Recognition for Human-Robot Interaction , 2007, Interaction Challenges for Intelligent Assistants.

[12]  Ryo Nakahashi,et al.  Modeling Human Understanding of Complex Intentional Action with a Bayesian Nonparametric Subgoal Model , 2015, AAAI.

[13]  Katsushi Ikeuchi,et al.  Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Matthias Scheutz,et al.  Coordination in human-robot teams using mental modeling and plan recognition , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Hung Hai Bui,et al.  A General Model for Online Probabilistic Plan Recognition , 2003, IJCAI.

[16]  Robert P. Goldman,et al.  A probabilistic plan recognition algorithm based on plan tree grammars , 2009, Artif. Intell..

[17]  Michael P. Wellman,et al.  Probabilistic State-Dependent Grammars for Plan Recognition , 2000, UAI.

[18]  Jianxiong Xiao,et al.  Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Emilio Frazzoli,et al.  Anytime Motion Planning using the RRT* , 2011, 2011 IEEE International Conference on Robotics and Automation.

[20]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[21]  Yunde Jia,et al.  Parsing video events with goal inference and intent prediction , 2011, 2011 International Conference on Computer Vision.

[22]  Byron Boots,et al.  Graph-Based Inverse Optimal Control for Robot Manipulation , 2015, IJCAI.

[23]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[24]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[25]  Joshua B. Tenenbaum,et al.  Bayesian models of human action understanding , 2005, NIPS.

[26]  Chris L. Baker,et al.  Modeling Human Plan Recognition Using Bayesian Theory of Mind , 2014 .

[27]  Henry A. Kautz,et al.  Learning and Predicting Transportation Routines , 2006 .

[28]  Svetha Venkatesh,et al.  Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  Arthur C. Sanderson,et al.  AND/OR graph representation of assembly plans , 1986, IEEE Trans. Robotics Autom..

[30]  Arthur C. Sanderson,et al.  Planning repair sequences using the AND/OR graph representation of assembly plans , 1988, Proceedings. 1988 IEEE International Conference on Robotics and Automation.

[31]  Benjamin Z. Yao,et al.  Unsupervised learning of event AND-OR grammar and semantics from video , 2011, 2011 International Conference on Computer Vision.

[32]  Bernhard Schölkopf,et al.  Probabilistic Modeling of Human Movements for Intention Inference , 2012, Robotics: Science and Systems.

[33]  Svetha Venkatesh,et al.  Policy Recognition in the Abstract Hidden Markov Model , 2002 .

[34]  Robert P. Goldman,et al.  A New Model of Plan Recognition , 1999, UAI.

[35]  Pavol Návrat,et al.  Expressivity of STRIPS-Like and HTN-Like Planning , 2007, KES-AMSTA.

[36]  Noah D. Goodman,et al.  Reasoning about reasoning by nested conditioning: Modeling theory of mind with probabilistic programs , 2014, Cognitive Systems Research.

[37]  Song-Chun Zhu,et al.  Inferring "Dark Matter" and "Dark Energy" from Videos , 2013, 2013 IEEE International Conference on Computer Vision.

[38]  Robert P. Goldman,et al.  A Bayesian Model of Plan Recognition , 1993, Artif. Intell..