Minimalist plans for interpreting manipulation actions

Humans attribute meaning to actions, and can recognize, imitate, predict, compose from parts, and analyse complex actions performed by other humans. We have built a model of action representation and understanding which takes as input perceptual data of humans performing manipulatory actions and finds a semantic interpretation of it. It achieves this by representing actions as minimal plans based on a few primitives. The motivation for our approach is to have a description, that abstracts away the variations in the way humans perform actions. The model can be used to represent complex activities on the basis of simple actions. The primitives of these minimal plans are embodied in the physicality of the system doing the analysis. The model understands an action under observation by recognising which plan is occurring. Using primitives thus rooted in its own physical structure, the model has a semanticist and causal understanding of what it observes. Using plans, the model considers actions as well as complex activities in terms of causality, compositions, and goal achievement, enabling it to perform complex tasks like prediction of primitives, separation of interleaved actions and filtering of perceptual input. We use our model over an action dataset involving humans using hand tools on objects in a constrained universe to understand an activity it has not seen before in terms of actions whose plans it knows of. The model thus illustrates a novel approach of understanding human actions by a robot.

[1]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[3]  T. Odlin Women, Fire, and Dangerous Things: What Categories Reveal about the Mind , 1988 .

[4]  Shyamsundar Rajaram,et al.  Human Activity Recognition Using Multidimensional Indexing , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Eric Horvitz,et al.  Layered representations for human activity recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[6]  Rama Chellappa,et al.  Identification of humans using gait , 2004, IEEE Transactions on Image Processing.

[7]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[9]  A. Whiten,et al.  Imitation of hierarchical action structure by young children. , 2006, Developmental science.

[10]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[11]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Håkan L. S. Younes,et al.  VHPOP: Versatile Heuristic Partial Order Planner , 2003, J. Artif. Intell. Res..

[13]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[14]  Bohyung Han,et al.  Visual Tracking by Continuous Density Propagation in Sequential Bayesian Filtering Framework , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Yiannis Aloimonos,et al.  Active segmentation for robotics , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Yi Li,et al.  Extraction of parametric human model for posture recognition using genetic algorithm , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[17]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[18]  J. Mandler How to build a baby: On the development of an accessible representational system☆ , 1988 .

[19]  Thomas Feix,et al.  A comprehensive grasp taxonomy , 2009 .

[20]  Liang Wang,et al.  Learning and Matching of Dynamic Shape Manifolds for Human Action Recognition , 2007, IEEE Transactions on Image Processing.

[21]  George Lakoff,et al.  Women, Fire, and Dangerous Things , 1987 .

[22]  Ramakant Nevatia,et al.  Large-scale event detection using semi-hidden Markov models , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  Yiannis Aloimonos,et al.  Detection of Manipulation Action Consequences (MAC) , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Irfan A. Essa,et al.  Recognizing multitasked activities from video using stochastic context-free grammar , 2002, AAAI/IAAI.

[26]  Douglas Summers-Stay,et al.  Using a minimal action grammar for activity understanding in the real world , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Noam Chomsky Lectures on Government and Binding: The Pisa Lectures , 1993 .

[28]  Gregory D. Hager,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, CVPR.

[29]  Yiannis Aloimonos,et al.  The minimalist grammar of action , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[30]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[31]  Payam Saisan,et al.  Dynamic texture recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[32]  Loong Fah Cheong,et al.  Active segmentation with fixation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[34]  J. Mazziotta,et al.  Cortical mechanisms of human imitation. , 1999, Science.