Multi-modal integration of dynamic audiovisual patterns for an interactive reinforcement learning scenario

Robots in domestic environments are receiving more attention, especially in scenarios where they should interact with parent-like trainers for dynamically acquiring and refining knowledge. A prominent paradigm for dynamically learning new tasks has been reinforcement learning. However, due to excessive time needed for the learning process, a promising extension has been made by incorporating an external parent-like trainer into the learning cycle in order to scaffold and speed up the apprenticeship using advice about what actions should be performed for achieving a goal. In interactive reinforcement learning, different uni-modal control interfaces have been proposed that are often quite limited and do not take into account multiple sensor modalities. In this paper, we propose the integration of audiovisual patterns to provide advice to the agent using multi-modal information. In our approach, advice can be given using either speech, gestures, or a combination of both. We introduce a neural network-based approach to integrate multi-modal information from uni-modal modules based on their confidence. Results show that multi-modal integration leads to a better performance of interactive reinforcement learning with the robot being able to learn faster with greater rewards compared to uni-modal scenarios.

[1]  Francoise Beaufays,et al.  “Your Word is my Command”: Google Search by Voice: A Case Study , 2010 .

[2]  Stefan Wermter,et al.  Towards integrating learning by demonstration and learning by instruction in a multimodal robotic , 2003 .

[3]  I. Marsic,et al.  Integration of Speech and Gesture for MultimodalHuman-Computer , 2008 .

[4]  Stephen R. Marsland,et al.  A self-organising network that grows when required , 2002, Neural Networks.

[5]  Yasuo Ariki,et al.  Disambiguation in Unknown Object Detection by Integrating Image and Speech Recognition Confidences , 2012, ACCV.

[6]  Stefan Wermter,et al.  Training Agents With Interactive Reinforcement Learning and Contextual Affordances , 2016, IEEE Transactions on Cognitive and Developmental Systems.

[7]  Yukie Nagai,et al.  Parental scaffolding as a bootstrapping mechanism for learning grasp affordances and imitation skills , 2014, Robotica.

[8]  Andrea Lockerd Thomaz,et al.  Asymmetric Interpretations of Positive and Negative Human Feedback for a Social Learning Agent , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[9]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[11]  Stefan Wermter,et al.  Self-organizing neural integration of pose-motion features for human action recognition , 2015, Front. Neurorobot..

[12]  Yan Guo,et al.  Audio/video fusion for objects recognition , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Pierre-Yves Oudeyer,et al.  Poppy: Open source 3D printed robot for experiments in developmental robotics , 2014, 4th International Conference on Development and Learning and on Epigenetic Robotics.

[14]  Stefan Wermter,et al.  Improving Domain-independent Cloud-Based Speech Recognition with Domain-Dependent Phonetic Post-Processing , 2014, AAAI.

[15]  Cynthia Breazeal,et al.  Training a Robot via Human Feedback: A Case Study , 2013, ICSR.

[16]  T. Martínez,et al.  Competitive Hebbian Learning Rule Forms Perfectly Topology Preserving Maps , 1993 .

[17]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Kerstin Voigt,et al.  Self-organizing maps with a single neuron , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[20]  Osamu Hasegawa,et al.  Estimating multimodal attributes for unknown objects , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[21]  Stefan Wermter,et al.  Modeling development of natural multi-sensory integration using neural self-organisation and probabilistic population codes , 2015, Connect. Sci..

[22]  Stefan Wermter,et al.  Interactive reinforcement learning through speech guidance in a domestic scenario , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[23]  Stefan Wermter,et al.  HandSOM - neural clustering of hand motion for gesture recognition in real time , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[24]  Radu Horaud,et al.  Online multimodal speaker detection for humanoid robots , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[25]  George M. Georgiou,et al.  Exact Interpolation and Learning in Quadratic Neural Networks , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[26]  Sonia Chernova,et al.  Effect of human guidance and state space size on Interactive Reinforcement Learning , 2011, 2011 RO-MAN.