A Cognitive System for Human Manipulation Action Understanding

This paper describes the architecture of a cognitive system that interprets human manipulation actions from perceptual information (image and depth data) and consists of perceptual modules and reasoning modules that are in interaction with each other. The contributions of our work are along two core problems at the heart of action understanding: a.) the grounding of relevant information about actions in perception (the perception action integration problem), and b.) the organization of perceptual and high-level symbolic information for interpreting the actions (the sequencing problem). At the high level, actions are represented with the Manipulation Action Grammar, a context-free grammar and associated parsing algorithms, which organizes actions as a sequence of sub-events. Each sub-event is described by the hand, movements and the objects and tools involved, and the relevant information about these quantities is obtained from biological-inspired perception modules. These modules track the hands and objects and recognize the hand grasp, objects and actions using attention, segmentation, and feature description. Experiments on a new dataset of manipulation actions show that our system can successfully extract the relevant visual information and semantic representation. This representation could further be used by the cognitive agent for reasoning, prediction, and planning.

[1]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[2]  Larry Tesler,et al.  A Conceptual Dependency Parser for Natural Language , 1969, COLING.

[3]  John K. Tsotsos Analyzing vision at the complexity level , 1990, Behavioral and Brain Sciences.

[4]  Noam Chomsky Lectures on Government and Binding: The Pisa Lectures , 1993 .

[5]  James A. Hendler,et al.  Languages, behaviors, hybrid architectures, and motion control , 1998 .

[6]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[7]  Yi Li,et al.  Extraction of parametric human model for posture recognition using genetic algorithm , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[8]  Payam Saisan,et al.  Dynamic texture recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  Shyamsundar Rajaram,et al.  Human Activity Recognition Using Multidimensional Indexing , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Rama Chellappa,et al.  Identification of humans using gait , 2004, IEEE Transactions on Image Processing.

[11]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[13]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[14]  George J. Pappas,et al.  Hybrid Controllers for Path Planning: A Temporal Logic Approach , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[15]  Y. Aloimonos,et al.  Discovering a Language for Human Activity 1 , 2005 .

[16]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[17]  Liang Wang,et al.  Learning and Matching of Dynamic Shape Manifolds for Human Action Recognition , 2007, IEEE Transactions on Image Processing.

[18]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[19]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Yiannis Aloimonos,et al.  Active segmentation for robotics , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[22]  Bohyung Han,et al.  Visual Tracking by Continuous Density Propagation in Sequential Bayesian Filtering Framework , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  R. Vidal,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Yi Li,et al.  Learning shift-invariant sparse representation of actions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Eren Erdal Aksoy,et al.  Learning the semantics of object–action relations by observation , 2011, Int. J. Robotics Res..

[26]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[27]  Yiannis Aloimonos,et al.  Towards a Watson that sees: Language-guided action recognition for robots , 2012, 2012 IEEE International Conference on Robotics and Automation.

[28]  Douglas Summers-Stay,et al.  Using a minimal action grammar for activity understanding in the real world , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  Daniel DeMenthon,et al.  The image torque operator: A new tool for mid-level vision , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Yiannis Aloimonos,et al.  The minimalist grammar of action , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[31]  Yiannis Aloimonos,et al.  Minimalist plans for interpreting manipulation actions , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Ales Ude,et al.  Toward a library of manipulation actions based on semantic object-action relations , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[33]  Yiannis Aloimonos,et al.  Detection of Manipulation Action Consequences (MAC) , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Neil T. Dantam,et al.  The Motion Grammar: Analysis of a Linguistic Method for Robot Control , 2013, IEEE Transactions on Robotics.

[35]  Ales Ude,et al.  A Simple Ontology of Manipulation Actions Based on Hand-Object Relations , 2013, IEEE Transactions on Autonomous Mental Development.