A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans

As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer's target policy. This work entails a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent's action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures of performance. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions.

[1]  David L. Roberts,et al.  Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning , 2015, Autonomous Agents and Multi-Agent Systems.

[2]  Sonia Chernova,et al.  Effect of human guidance and state space size on Interactive Reinforcement Learning , 2011, 2011 RO-MAN.

[3]  Shimon Whiteson,et al.  Using informative behavior to increase engagement in the tamer framework , 2013, AAMAS.

[4]  Eduardo F. Morales,et al.  Dynamic Reward Shaping: Training a Robot by Voice , 2010, IBERAMIA.

[5]  Peter Stone,et al.  A social reinforcement learning agent , 2001, AGENTS '01.

[6]  C. Breazeal,et al.  Robot learning via socially guided exploration , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[7]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[8]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[9]  Sergei Nirenburg,et al.  A Statistical Approach to Machine Translation , 2003 .

[10]  Pradyot V. N. Korupolu Integrating Human Instructions and Reinforcement Learners : An SRL Approach , 2012 .

[11]  Cynthia Breazeal,et al.  Training a Robot via Human Feedback: A Case Study , 2013, ICSR.

[12]  W. Hockley Analysis of response time distributions in the study of cognitive processes. , 1984 .

[13]  Thomas G. Dietterich,et al.  Reinforcement Learning Via Practice and Critique Advice , 2010, AAAI.

[14]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[15]  Farbod Fahimi,et al.  Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning , 2011, 2011 IEEE International Conference on Rehabilitation Robotics.

[16]  Bradley C. Love,et al.  A New Experimental Perspective , 2012 .

[17]  David L. Roberts,et al.  Training an Agent to Ground Commands with Reward and Punishment , 2014, AAAI 2014.

[18]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[19]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[20]  Andrea Lockerd Thomaz,et al.  Policy Shaping with Human Teachers , 2015, IJCAI.

[21]  Fiery Cushman,et al.  Teaching with Rewards and Punishments: Reinforcement or Communication? , 2015, CogSci.

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  Balaraman Ravindran,et al.  Instructing a Reinforcement Learner , 2012, FLAIRS.