Training an Interactive Humanoid Robot Using Multimodal Deep Reinforcement Learning

Training robots to perceive, act and communicate using multiple modalities still represents a challenging problem, particularly if robots are expected to learn efficiently from small sets of example interactions. We describe a learning approach as a step in this direction, where we teach a humanoid robot how to play the game of noughts and crosses. Given that multiple multimodal skills can be trained to play this game, we focus our attention to training the robot to perceive the game, and to interact in this game. Our multimodal deep reinforcement learning agent perceives multimodal features and exhibits verbal and non-verbal actions while playing. Experimental results using simulations show that the robot can learn to win or draw up to 98% of the games. A pilot test of the proposed multimodal system for the targeted game---integrating speech, vision and gestures---reports that reasonable and fluent interactions can be achieved using the proposed approach.

[1]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[2]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[3]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[4]  Marco Wiering,et al.  Temporal Difference Learning for the Game Tic-Tac-Toe 3D: Applying Structure to Neural Networks , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[5]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[6]  Stefan Wermter,et al.  Towards multimodal neural robot learning , 2004, Robotics Auton. Syst..

[7]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[8]  Oliver Lemon,et al.  Strategic Dialogue Management via Deep Reinforcement Learning , 2015, NIPS 2015.

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[11]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[12]  Nina Dethlefs,et al.  Hierarchical reinforcement learning for situated natural language generation , 2014, Natural Language Engineering.

[13]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[14]  Sebastian Siegel Training an artificial neural network to play tic-tac-toe , 2001 .

[15]  Pierre-Yves Oudeyer,et al.  Pragmatic Frames for Teaching and Learning in Human–Robot Interaction: Review and Challenges , 2016, Front. Neurorobot..

[16]  Peter I. Corke,et al.  Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control , 2015, ICRA 2015.

[17]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[18]  Heriberto Cuayáhuitl,et al.  Robot learning from verbal interaction: A brief survey , 2015 .

[19]  Heriberto Cuayáhuitl,et al.  SimpleDS: A Simple Deep Reinforcement Learning Dialogue System , 2016, IWSDS.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Nikolaos Mavridis,et al.  A review of verbal and non-verbal human-robot interactive communication , 2014, Robotics Auton. Syst..

[22]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[23]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[24]  Justin A. Boyan,et al.  Modular Neural Networks for Learning Context-Dependent Game Strategies , 2007 .

[25]  VelosoManuela,et al.  A survey of robot learning from demonstration , 2009 .

[26]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..