Communication with Speech and Gestures: Applications of Recurrent Neural Networks to Robot Language Learning

Recurrent neural networks have recently shown significant potential in different language applications, ranging from natural language processing to language modelling. This paper introduces a research effort to use such networks to develop and evaluate natural language acquisition on a humanoid robot. Here, the problem is twofold. First, the focus will be put on using the gesture-word combination stage observed in infants to transition from single to multi-word utterances. Secondly, research will be carried out in the domain of connecting action learning with language learning. In the former, the long-short term memory architecture will be implemented, whilst in the latter multiple time-scale recurrent neural networks will be used. This will allow for comparison between the two architectures, whilst highlighting the strengths and shortcomings of both with respect to the language learning problem. Here, the main research efforts, challenges and expected outcomes are described.

[1]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Minho Lee,et al.  Goal-oriented behavior sequence generation based on semantic commands using multiple timescales recurrent neural network with initial state correction , 2014, Neurocomputing.

[3]  L. Fadiga,et al.  Active perception: sensorimotor circuits as a cortical basis for language , 2010, Nature Reviews Neuroscience.

[4]  Lorenzo Rosasco,et al.  Object identification from few examples by improving the invariance of a Deep Convolutional Neural Network , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  O. Al-Dakkak,et al.  Vocal Commands to a Robot by an Isolated Words Recognition System using HMM , 2006, 2006 2nd International Conference on Information & Communication Technologies.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Anat Ninio,et al.  NO VERB IS AN ISLAND: NEGATIVE EVIDENCE ON THE VERB ISLAND HYPOTHESIS * , 2003 .

[8]  V. Volterra,et al.  Gestures and words during the transition to two-word speech , 1996, Journal of Child Language.

[9]  Tetsuya Ogata,et al.  Recognition and Generation of Sentences through Self-organizing Linguistic Hierarchy Using MTRNN , 2010, IEA/AIE.

[10]  Angelo Cangelosi,et al.  Language Acquisition and Symbol Grounding Transfer with Neural Networks and Cognitive Robots , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[11]  Giulio Sandini,et al.  The iCub humanoid robot: An open-systems platform for research in cognitive development , 2010, Neural Networks.

[12]  Tetsuya Ogata,et al.  Emergence of hierarchical structure mirroring linguistic composition in a recurrent neural network , 2011, Neural Networks.

[13]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[14]  Susan Goldin-Meadow,et al.  Language and Gesture: Gesture and the transition from one- to two-word speech: when hand and mouth come together , 2000 .

[15]  Tetsuya Ogata,et al.  Modeling tool-body assimilation using second-order Recurrent Neural Network , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Marijn F. Stollenga Advances in humanoid control and perception , 2016 .

[17]  Tetsuya Ogata,et al.  Integration of behaviors and languages with a hierarchal structure self-organized in a neuro-dynamical model , 2013, 2013 IEEE Workshop on Robotic Intelligence in Informationally Structured Space (RiiSS).

[18]  Jun Tani,et al.  Development of compositional and contextual communication of robots by using the multiple timescales dynamic neural network , 2015, 2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[19]  Tetsuya Ogata,et al.  Motion generation based on reliable predictability using self-organized object features , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.