Hierarchies for Embodied Action Perception

During social interactions, humans are capable of initiating and responding to rich and complex social actions despite having incomplete world knowledge, and physical, perceptual and computational constraints. This capability relies on action perception mechanisms that exploit regularities in observed goal-oriented behaviours to generate robust predictions and reduce the workload of sensing systems. To achieve this essential capability, we argue that the following three factors are fundamental. First, human knowledge is frequently hierarchically structured, both in the perceptual and execution domains. Second, human perception is an active process driven by current task requirements and context; this is particularly important when the perceptual input is complex (e.g. human motion) and the agent has to operate under embodiment constraints. Third, learning is at the heart of action perception mechanisms, underlying the agent’s ability to add new behaviours to its repertoire. Based on these factors, we review multiple instantiations of a hierarchically-organised biologically-inspired framework for embodied action perception, demonstrating its flexibility in addressing the rich computational contexts of action perception and learning in robotic platforms.

[1]  Punit Shah Toward a Neurobiology of Unrealistic Optimism , 2012, Front. Psychology.

[2]  A. P. Dawid,et al.  Generative or Discriminative? Getting the Best of Both Worlds , 2007 .

[3]  Yiannis Demiris,et al.  Learning Forward Models for Robots , 2005, IJCAI.

[4]  G. Hesslow Conscious thought as simulation of behaviour and perception , 2002, Trends in Cognitive Sciences.

[5]  Rajesh P. N. Rao,et al.  An Active Vision Architecture Based on Iconic Representations , 1995, Artif. Intell..

[6]  C. Keysers,et al.  Social Neuroscience: Mirror Neurons Recorded in Humans , 2010, Current Biology.

[7]  K. Dautenhahn,et al.  Imitation in Animals and Artifacts , 2002 .

[8]  Gillian M. Hayes,et al.  Imitation as a dual-route process featuring prediction and learning components: A biologically plaus , 2002 .

[9]  T. Nichols,et al.  The decerebrate cat generates the essential features of the force constraint strategy. , 2010, Journal of neurophysiology.

[10]  E. Miller,et al.  Top-Down Versus Bottom-Up Control of Attention in the Prefrontal and Posterior Parietal Cortices , 2007, Science.

[11]  M. Jeannerod The representing brain: Neural correlates of motor intention and imagery , 1994, Behavioral and Brain Sciences.

[12]  K. Dautenhahn,et al.  The correspondence problem , 2002 .

[13]  A. Noë,et al.  A sensorimotor account of vision and visual consciousness. , 2001, The Behavioral and brain sciences.

[14]  R. Passingham,et al.  Action observation and acquired motor skills: an FMRI study with expert dancers. , 2005, Cerebral cortex.

[15]  A. Goldman,et al.  Mirror neurons and the simulation theory of mind-reading , 1998, Trends in Cognitive Sciences.

[16]  Giovanni Pezzulo,et al.  Learning to Grasp Information with Your Own Hands , 2011, TAROS.

[17]  Leslie Pack Kaelbling,et al.  Representing hierarchical POMDPs as DBNs for multi-scale robot localization , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[18]  C. Keysers,et al.  The Observation and Execution of Actions Share Motor and Somatosensory Voxels in all Tested Subjects: Single-Subject Analyses of Unsmoothed fMRI Data , 2008, Cerebral cortex.

[19]  M. Bar The proactive brain: using analogies and associations to generate predictions , 2007, Trends in Cognitive Sciences.

[20]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[21]  Leila Reddy,et al.  Coding of visual objects in the ventral stream , 2006, Current Opinion in Neurobiology.

[22]  A. Gopnik,et al.  Words, thoughts, and theories , 1997 .

[23]  Shimon Ullman,et al.  Semantic Hierarchies for Recognizing Objects and Parts , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  John Demiris,et al.  Movement imitation mechanisms in robots and humans , 1999 .

[25]  Linda Jeffery,et al.  Race-specific norms for coding face identity and a functional role for norms. , 2010, Journal of vision.

[26]  I. Biederman,et al.  Localizing the cortical region mediating visual awareness of object identity. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[27]  H. Wells,et al.  The praying mantids. , 1999 .

[28]  Á. Pascual-Leone,et al.  Phase-specific modulation of cortical motor output during movement observation , 2001, Neuroreport.

[29]  R. Bajcsy Active perception , 1988 .

[30]  R. Hinde,et al.  Growing Points in Ethology , 1976 .

[31]  Yiannis Demiris,et al.  Hierarchical attentive multiple models for execution and recognition of actions , 2006, Robotics Auton. Syst..

[32]  Austin Tate,et al.  Generating Project Networks , 1977, IJCAI.

[33]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[34]  Yiannis Demiris,et al.  Learning reusable task components using hierarchical activity grammars with uncertainties , 2012, 2012 IEEE International Conference on Robotics and Automation.

[35]  Yiannis Demiris,et al.  Prediction of intent in robotics and multi-agent systems , 2007, Cognitive Processing.

[36]  Yiannis Demiris,et al.  Towards an open-source social middleware for humanoid robots , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[37]  Yiannis Demiris,et al.  Object Grasping using the Minimum Variance Model , 2006, Biological Cybernetics.

[38]  Jake K. Aggarwal,et al.  Recognition of Composite Human Activities through Context-Free Grammar Based Representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[39]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[40]  Dana H. Ballard,et al.  Animate Vision , 1991, Artif. Intell..

[41]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[42]  Pat Langley,et al.  Learning Context-Free Grammars with a Simplicity Bias , 2000, ECML.

[43]  Á. Pascual-Leone,et al.  Modulation of premotor mirror neuron activity during observation of unpredictable grasping movements , 2004, The European journal of neuroscience.

[44]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  J. Mazziotta,et al.  Cortical mechanisms of human imitation. , 1999, Science.

[46]  HERBERT A. SIMON,et al.  The Architecture of Complexity , 1991 .

[47]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[48]  Raymond H. Cuijpers,et al.  Goals and means in action observation: A computational approach , 2006, Neural Networks.

[49]  B. Hommel,et al.  Intentional control of attention: action planning primes action-related stimulus dimensions , 2007, Psychological research.

[50]  Alan Yuille,et al.  Active Vision , 2014, Computer Vision, A Reference Guide.

[51]  Yiannis Demiris,et al.  Content-based control of goal-directed attention during human action perception , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[52]  Scott T. Grafton,et al.  Evidence for a distributed hierarchy of action representation in the brain. , 2007, Human movement science.

[53]  G. Pezzulo,et al.  When affordances climb into your mind: Advantages of motor simulation in a memory task performed by novice and expert rock climbers , 2010, Brain and Cognition.

[54]  Yiannis Demiris,et al.  Towards incremental learning of task-dependent action sequences using probabilistic parsing , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[55]  D. Ballard,et al.  Eye guidance in natural vision: reinterpreting salience. , 2011, Journal of vision.

[56]  Giovanni Pezzulo,et al.  How can bottom-up information shape learning of top-down attention-control skills? , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[57]  George L. Malcolm,et al.  Combining top-down processes to guide eye movements during real-world scene search. , 2010, Journal of vision.

[58]  G. Rizzolatti,et al.  Action recognition in the premotor cortex. , 1996, Brain : a journal of neurology.

[59]  Dario Floreano,et al.  An evolutionary active-vision system , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[60]  Yiannis Demiris,et al.  Towards One Shot Learning by imitation for humanoid robots , 2010, 2010 IEEE International Conference on Robotics and Automation.

[61]  Yiannis Demiris,et al.  Distributed, predictive perception of actions: a biologically inspired robotics architecture for imitation and learning , 2003, Connect. Sci..

[62]  Huseyin Boyaci,et al.  Estimating the glossiness transfer function induced by illumination change and testing its transitivity. , 2010, Journal of vision.

[63]  G. Rizzolatti,et al.  Motor facilitation during action observation: a magnetic stimulation study. , 1995, Journal of neurophysiology.

[64]  Enric Plaza,et al.  Proceedings of the 11th European Conference on Machine Learning , 2000 .

[65]  Yiannis Demiris,et al.  Abstraction in Recognition to Solve the Correspondence Problem for Robot Imitation , 2004 .

[66]  Rick Grush,et al.  The emulation theory of representation: Motor control, imagery, and perception , 2004, Behavioral and Brain Sciences.

[67]  W. R. Hess,et al.  The functional organization of the diencephalon , 1957 .

[68]  Geoffrey E. Hinton Learning to represent visual input , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[69]  Andrew G. Barto,et al.  Behavioral Hierarchy: Exploration and Representation , 2013, Computational and Robotic Models of the Hierarchical Organization of Behavior.

[70]  M. Jeannerod Intersegmental coordination during reaching at natural visual objects , 1981 .

[71]  Angelo Cangelosi,et al.  The Mechanics of Embodiment: A Dialog on Embodiment and Computational Modeling , 2011, Front. Psychology.

[72]  Daniel M. Wolpert,et al.  Hierarchical MOSAIC for movement generation , 2003 .

[73]  Dario Floreano,et al.  Evolutionary Active Vision Toward Three Dimensional Landmark-Navigation , 2006, SAB.

[74]  E. Halgren,et al.  Top-down facilitation of visual recognition. , 2006, Proceedings of the National Academy of Sciences of the United States of America.