Interactively shaping robot behaviour with unlabeled human instructions

In this paper, we propose a framework that enables a human teacher to shape a robot behaviour by interactively providing it with unlabeled instructions. We ground the meaning of instruction signals in the task-learning process, and use them simultaneously for guiding the latter. We implement our framework as a modular architecture, named TICS (Task-Instruction-Contingency-Shaping) that combines different information sources: a predefined reward function, human evaluative feedback and unlabeled instructions. This approach provides a novel perspective for robotic task learning that lies between Reinforcement Learning and Supervised Learning paradigms. We evaluate our framework both in simulation and with a real robot. The experimental results demonstrate the effectiveness of our framework in accelerating the task-learning process and in reducing the number of required teaching signals.

[1]  Luke S. Zettlemoyer,et al.  Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[2]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[3]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[4]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[5]  Balaraman Ravindran,et al.  Instructing a Reinforcement Learner , 2012, FLAIRS.

[6]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[7]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[8]  Peter Stone,et al.  Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[9]  Daniel Kudenko,et al.  Online learning of shaping rewards in reinforcement learning , 2010, Neural Networks.

[10]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[11]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[12]  C. Breazeal,et al.  Transparency and Socially Guided Machine Learning , 2006 .

[13]  E. LESTER SMITH,et al.  AND OTHERS , 2005 .

[14]  Paul E. Utgoff,et al.  Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.

[15]  Pierre-Yves Oudeyer,et al.  Pragmatic Frames for Teaching and Learning in Human–Robot Interaction: Review and Challenges , 2016, Front. Neurorobot..

[16]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[17]  Pierre-Yves Oudeyer,et al.  Robot learning simultaneously a task and how to interpret human instructions , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[18]  Christopher G. Atkeson,et al.  Optimization based full body control for the atlas robot , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[19]  David L. Roberts,et al.  A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback , 2014, AAAI.

[20]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[21]  Fiery Cushman,et al.  Teaching with Rewards and Punishments: Reinforcement or Communication? , 2015, CogSci.

[22]  Daniel Jurafsky,et al.  Learning to Follow Navigational Directions , 2010, ACL.

[23]  Stefan Wermter,et al.  Interactive reinforcement learning through speech guidance in a domestic scenario , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[24]  Bhaskara Marthi,et al.  Automatic shaping and decomposition of reward functions , 2007, ICML '07.

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Sonia Chernova,et al.  Effect of human guidance and state space size on Interactive Reinforcement Learning , 2011, 2011 RO-MAN.

[27]  Manuela M. Veloso,et al.  Interactive robot task training through dialog and demonstration , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[28]  Phil Husbands,et al.  Evolutionary robotics , 2014, Evolutionary Intelligence.

[29]  Groupe Pdmia Markov Decision Processes In Artificial Intelligence , 2009 .

[30]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[31]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[32]  Michael T. Rosenstein,et al.  Supervised Actor‐Critic Reinforcement Learning , 2012 .

[33]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[34]  Monica N. Nicolescu,et al.  Natural methods for robot task learning: instructive demonstrations, generalization and practice , 2003, AAMAS '03.

[35]  Guan Wang,et al.  Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.

[36]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[37]  Luke S. Zettlemoyer,et al.  Reading between the Lines: Learning to Map High-Level Instructions to Commands , 2010, ACL.

[38]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[39]  Aude Billard,et al.  Estimating Future Reward in Reinforcement Learning Animats using Associative Learning , 2004 .

[40]  Pradyot V. N. Korupolu Integrating Human Instructions and Reinforcement Learners : An SRL Approach , 2012 .

[41]  Cynthia Breazeal,et al.  Training a Robot via Human Feedback: A Case Study , 2013, ICSR.

[42]  A. E. Eiben,et al.  Evolutionary Robotics: What, Why, and Where to , 2015, Front. Robot. AI.

[43]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[44]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[45]  Gillian M. Hayes,et al.  Estimating Future Reward in Reinforcement Learning Animats using Associative Learning , 2004 .

[46]  Andrea Lockerd Thomaz,et al.  Robot Learning from Human Teachers , 2014, Robot Learning from Human Teachers.

[47]  W. Bradley Knox and Cynthia Breazeal and Peter Stone Learning from feedback on actions past and intended , 2012 .

[48]  Peter Stone,et al.  A social reinforcement learning agent , 2001, AGENTS '01.

[49]  C. Breazeal,et al.  Robot learning via socially guided exploration , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[50]  Smaranda Muresan,et al.  Grounding English Commands to Reward Functions , 2015, Robotics: Science and Systems.

[51]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[52]  Eduardo F. Morales,et al.  Dynamic Reward Shaping: Training a Robot by Voice , 2010, IBERAMIA.

[53]  David L. Roberts,et al.  Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning , 2015, Autonomous Agents and Multi-Agent Systems.

[54]  Patrick M. Pilarski,et al.  Simultaneous Control and Human Feedback in the Training of a Robotic Agent with Actor-Critic Reinforcement Learning , 2016, ArXiv.

[55]  Paul E. Utgoff,et al.  A Teaching Method for Reinforcement Learning , 1992, ML.

[56]  Peter Stone,et al.  Reinforcement learning from human reward: Discounting in episodic tasks , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.