Robot learning simultaneously a task and how to interpret human instructions

This paper presents an algorithm to bootstrap shared understanding in a human-robot interaction scenario where the user teaches a robot a new task using teaching instructions yet unknown to it. In such cases, the robot needs to estimate simultaneously what the task is and the associated meaning of instructions received from the user. For this work, we consider a scenario where a human teacher uses initially unknown spoken words, whose associated unknown meaning is either a feedback (good/bad) or a guidance (go left, right, ...). We present computational results, within an inverse reinforcement learning framework, showing that a) it is possible to learn the meaning of unknown and noisy teaching instructions, as well as a new task at the same time, b) it is possible to reuse the acquired knowledge about instructions for learning new tasks, and c) even if the robot initially knows some of the instructions' meanings, the use of extra unknown teaching instructions improves learning efficiency.

[1]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[2]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[3]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[4]  Manuel Lopes,et al.  Affordance-based imitation learning in robots , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Pierre-Yves Oudeyer,et al.  Bootstrapping intrinsically motivated learning with human demonstration , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[6]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[7]  José Santos-Victor,et al.  Abstraction Levels for Robotic Imitation: Overview and Computational Approaches , 2010, From Motor Learning to Interaction Learning in Robots.

[8]  Benjamin Schrauwen,et al.  A Bayesian Model for Exploiting Application Constraints to Enable Unsupervised Training of a P300-based BCI , 2012, PloS one.

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  Pierre-Yves Oudeyer,et al.  The Impact of Human–Robot Interfaces on the Learning of Visual Objects , 2013, IEEE Transactions on Robotics.

[11]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[12]  张国亮,et al.  Comparison of Different Implementations of MFCC , 2001 .

[13]  Maya Cakmak,et al.  Optimality of human teachers for robot learners , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[14]  Pierre-Yves Oudeyer,et al.  Simultaneous acquisition of task and feedback models , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[15]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[16]  Zheng Fang,et al.  Comparison of different implementations of MFCC , 2001 .

[17]  Andrea Lockerd Thomaz,et al.  Tutelage and Collaboration for Humanoid Robots , 2004, Int. J. Humanoid Robotics.

[18]  Thomas G. Dietterich,et al.  Reinforcement Learning Via Practice and Critique Advice , 2010, AAAI.

[19]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[20]  Manuel Lopes,et al.  Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[21]  Martin Heckmann,et al.  Teaching a humanoid robot: Headset-free speech interaction for audio-visual association learning , 2009, RO-MAN 2009 - The 18th IEEE International Symposium on Robot and Human Interactive Communication.

[22]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[23]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[24]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[25]  Monica N. Nicolescu,et al.  Natural methods for robot task learning: instructive demonstrations, generalization and practice , 2003, AAMAS '03.

[26]  Manuel C. Lopes,et al.  Robot self-initiative and personalization by learning through repeated interactions , 2011, 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[27]  Maya Cakmak,et al.  Designing robot learners that ask good questions , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).