Training a robot with evaluative feedback and unlabeled guidance signals

In this paper, we present a new method for training a robot by natural interaction using evaluative feedback and unlabeled guidance signals. Feedback signals are directly mapped to reward values and used for learning both the task and the meaning of the guidance signals. The learned guidance signals are used in return to bootstrap task learning. We propose to use unlabeled guidance signals as an alternative solution to preprogrammed guidance. We evaluate our method both in simulation and on a real robot.

[1]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[2]  Fiery Cushman,et al.  Teaching with Rewards and Punishments: Reinforcement or Communication? , 2015, CogSci.

[3]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[4]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[5]  Toyoaki Nishida,et al.  Learning interaction protocols using Augmented Baysian Networks applied to guided navigation , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Mohamed Chetouani,et al.  Social-Task Learning for HRI , 2015, ICSR.

[7]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[8]  Farbod Fahimi,et al.  Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning , 2011, 2011 IEEE International Conference on Rehabilitation Robotics.

[9]  Peter Stone,et al.  Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[10]  Cynthia Breazeal,et al.  Training a Robot via Human Feedback: A Case Study , 2013, ICSR.

[11]  Pierre-Yves Oudeyer,et al.  Robot learning simultaneously a task and how to interpret human instructions , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[12]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[13]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[14]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[15]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[16]  Leopoldo Altamirano Robles,et al.  Teaching a Robot to Perform Task through Imitation and On-line Feedback , 2011, CIARP.

[17]  Peter Stone,et al.  Reinforcement learning from human reward: Discounting in episodic tasks , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Sonia Chernova,et al.  Effect of human guidance and state space size on Interactive Reinforcement Learning , 2011, 2011 RO-MAN.