Contextual action recognition and target localization with an active allocation of attention on a humanoid robot

Exploratory gaze movements are fundamental for gathering the most relevant information regarding the partner during social interactions. Inspired by the cognitive mechanisms underlying human social behaviour, we have designed and implemented a system for a dynamic attention allocation which is able to actively control gaze movements during a visual action recognition task exploiting its own action execution predictions. Our humanoid robot is able, during the observation of a partner's reaching movement, to contextually estimate the goal position of the partner's hand and the location in space of the candidate targets. This is done while actively gazing around the environment, with the purpose of optimizing the gathering of information relevant for the task. Experimental results on a simulated environment show that active gaze control, based on the internal simulation of actions, provides a relevant advantage with respect to other action perception approaches, both in terms of estimation precision and of time required to recognize an action. Moreover, our model reproduces and extends some experimental results on human attention during an action perception.

[1]  George A. Alvarez,et al.  Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model , 2009, NIPS.

[2]  B. Tatler,et al.  Yarbus, eye movements, and vision , 2010, i-Perception.

[3]  G. Pezzulo,et al.  Proactive action preparation: seeing action preparation as a continuous and proactive process. , 2012, Motor control.

[4]  Chris L. Baker,et al.  Action understanding as inverse planning , 2009, Cognition.

[5]  Yiannis Demiris,et al.  Hierarchies for Embodied Action Perception , 2013, Computational and Robotic Models of the Hierarchical Organization of Behavior.

[6]  Jürgen Schmidhuber,et al.  Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..

[7]  R. Johansson,et al.  Eye–Hand Coordination during Learning of a Novel Visuomotor Task , 2005, The Journal of Neuroscience.

[8]  Marty G Woldorff,et al.  Timing and Sequence of Brain Activity in Top-Down Control of Visual-Spatial Attention , 2007, PLoS biology.

[9]  Angelo Cangelosi,et al.  An open-source simulator for cognitive robotics research: the prototype of the iCub humanoid robot simulator , 2008, PerMIS.

[10]  Giovanni Pezzulo,et al.  What should I do next? Using shared representations to solve interaction problems , 2011, Experimental Brain Research.

[11]  Raj Madhavan,et al.  Performance Metrics for Intelligent Systems Workshop (PerMIS'08 Workshop Proceedings) , 2008 .

[12]  Keith Kastella Discrimination gain to optimize detection and classification , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[13]  Yiannis Demiris,et al.  Hierarchical attentive multiple models for execution and recognition of actions , 2006, Robotics Auton. Syst..

[14]  D. Ballard,et al.  Eye guidance in natural vision: reinterpreting salience. , 2011, Journal of vision.

[15]  Giovanni Pezzulo,et al.  How can bottom-up information shape learning of top-down attention-control skills? , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[16]  Giulio Sandini,et al.  Design, realization and sensorization of the dexterous iCub hand , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[17]  Eric Sommerlade,et al.  Information-theoretic active scene exploration , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  John K. Tsotsos,et al.  A Computational Learning Theory of Active Object Recognition Under Uncertainty , 2012, International Journal of Computer Vision.

[19]  A. L. I︠A︡rbus Eye Movements and Vision , 1967 .

[20]  Guido C. H. E. de Croon,et al.  Adaptive Gaze Control for Object Detection , 2011, Cognitive Computation.

[21]  Yiannis Demiris,et al.  Perceiving the unusual: Temporal properties of hierarchical motor representations for action perception , 2006, Neural Networks.

[22]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[23]  Yiannis Demiris,et al.  Towards an open-source social middleware for humanoid robots , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[24]  M. Land Eye movements and the control of actions in everyday life , 2006, Progress in Retinal and Eye Research.

[25]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[26]  Dario Floreano,et al.  Enactive Robot Vision , 2008, Adapt. Behav..

[27]  Mauro Serpelloni,et al.  Kinetic and thermal energy harvesters for implantable medical devices and biomedical autonomous sensors , 2013 .

[28]  Mary M Hayhoe,et al.  Task and context determine where you look. , 2016, Journal of vision.

[29]  Dieter Fox,et al.  Reinforcement learning for sensing strategies , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[30]  Yiannis Demiris,et al.  Content-based control of goal-directed attention during human action perception , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[31]  D. Ballard,et al.  What you see is what you need. , 2003, Journal of vision.

[32]  Dana H. Ballard,et al.  Animate Vision , 1991, Artif. Intell..

[33]  Christian Balkenius,et al.  Integrating Epistemic Action (Active Vision) and Pragmatic Action (Reaching): A Neural Architecture for Camera-Arm Robots , 2008, SAB.

[34]  R. Johansson,et al.  Action plans used in action observation , 2003, Nature.

[35]  安藤 広志,et al.  20世紀の名著名論:David Marr:Vision:a Computational Investigation into the Human Representation and Processing of Visual Information , 2005 .

[36]  Nicholas J. Butko,et al.  Active perception , 2010 .