Understanding human intention by connecting perception and action learning in artificial agents

To develop an advanced human-robot interaction system, it is important to first understand how human beings learn to perceive, think, and act in an ever-changing world. In this paper, we propose an intention understanding system that uses an Object Augmented-Supervised Multiple Timescale Recurrent Neural Network (OA-SMTRNN) and demonstrate the effects of perception-action connected learning in an artificial agent, which is inspired by psychological and neurological phenomena in humans. We believe that action and perception are not isolated processes in human mental development, and argue that these psychological and neurological interactions can be replicated in a human-machine scenario. The proposed OA-SMTRNN consists of perception and action modules and their connection, which are constructed of supervised multiple timescale recurrent neural networks and the deep auto-encoder, respectively, and connects their perception and action for understanding human intention. Our experimental results show the effects of perception-action connected learning, and demonstrate that robots can understand human intention with OA-SMTRNN through perception-action connected learning.

[1]  Qi Cheng,et al.  Human intention recognition in Smart Assisted Living Systems using a Hierarchical Hidden Markov Model , 2008, 2008 IEEE International Conference on Automation Science and Engineering.

[2]  J. Decety,et al.  From the perception of action to the understanding of intention , 2001, Nature reviews. Neuroscience.

[3]  Tomas Salamon Design of Agent-Based Models , 2011 .

[4]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[5]  R. Sperry A modified concept of consciousness. , 1969, Psychological review.

[6]  A. Noë,et al.  Acting out our sensory experience , 2001 .

[7]  Uwe D. Hanebeck,et al.  A generic model for estimating user intentions in human-robot cooperation , 2005, ICINCO.

[8]  J. Gibson The perception of the visual world , 1951 .

[9]  R. Passingham The frontal lobes and voluntary action , 1993 .

[10]  I. Johnsrude,et al.  Somatotopic Representation of Action Words in Human Motor and Premotor Cortex , 2004, Neuron.

[11]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[12]  Danica Kragic,et al.  Visual object-action recognition: Inferring object affordances from human demonstration , 2011, Comput. Vis. Image Underst..

[13]  Jun Tani,et al.  Emergence of Functional Hierarchy in a Multiple Timescale Neural Network Model: A Humanoid Robot Experiment , 2008, PLoS Comput. Biol..

[14]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[15]  Minho Lee,et al.  Stereo saliency map considering affective factors and selective motion analysis in a dynamic environment , 2008, Neural Networks.

[16]  J. Fuster Cortex and mind : unifying cognition , 2003 .

[17]  Ranulfo Romo,et al.  Language Abilities of Motor Cortex , 2004, Neuron.

[18]  W. Prinz A common-coding approach to perception and action , 1990 .

[19]  Manuel Lopes,et al.  Learning Object Affordances: From Sensory--Motor Coordination to Imitation , 2008, IEEE Transactions on Robotics.

[20]  Tanja Schultz,et al.  HMM-based human motion recognition with optical flow data , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[21]  D. Premack,et al.  Does the chimpanzee have a theory of mind? , 1978, Behavioral and Brain Sciences.

[22]  Peter Stagge,et al.  Recurrent neural networks for time series classification , 2003, Neurocomputing.

[23]  Minho Lee,et al.  Real-time human action classification using a dynamic neural model , 2015, Neural Networks.