Improving Interactive Reinforcement Agent Planning with Human Demonstration

TAMER has proven to be a powerful interactive reinforcement learning method for allowing ordinary people to teach and personalize autonomous agents' behavior by providing evaluative feedback. However, a TAMER agent planning with UCT---a Monte Carlo Tree Search strategy, can only update states along its path and might induce high learning cost especially for a physical robot. In this paper, we propose to drive the agent's exploration along the optimal path and reduce the learning cost by initializing the agent's reward function via inverse reinforcement learning from demonstration. We test our proposed method in the RL benchmark domain---Grid World---with different discounts on human reward. Our results show that learning from demonstration can allow a TAMER agent to learn a roughly optimal policy up to the deepest search and encourage the agent to explore along the optimal path. In addition, we find that learning from demonstration can improve the learning efficiency by reducing total feedback, the number of incorrect actions and increasing the ratio of correct actions to obtain an optimal policy, allowing a TAMER agent to converge faster.

[1]  David L. Roberts,et al.  Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning , 2015, Autonomous Agents and Multi-Agent Systems.

[2]  Cynthia Breazeal,et al.  Proceedings of the ACM/IEEE international conference on Human-robot interaction , 2007 .

[3]  Farbod Fahimi,et al.  Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning , 2011, 2011 IEEE International Conference on Rehabilitation Robotics.

[4]  T. Michael Knasel,et al.  Robotics and autonomous systems , 1988, Robotics Auton. Syst..

[5]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[6]  Sonia Chernova,et al.  Effect of human guidance and state space size on Interactive Reinforcement Learning , 2011, 2011 RO-MAN.

[7]  Alan Fern,et al.  Imitation Learning with Demonstrations and Shaping Rewards , 2014, AAAI.

[8]  Peter Stone,et al.  Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance , 2015, Artif. Intell..

[9]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[10]  Sonia Chernova,et al.  Reinforcement Learning from Demonstration through Shaping , 2015, IJCAI.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[13]  David L. Roberts,et al.  A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback , 2014, AAAI.

[14]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[15]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[16]  Guan Wang,et al.  Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.

[17]  Peter Stone,et al.  A social reinforcement learning agent , 2001, AGENTS '01.

[18]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[19]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[20]  Alois Knoll,et al.  The roles of haptic-ostensive referring expressions in cooperative, task-based human-robot dialogue , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[21]  Peter Stone,et al.  Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces , 2017, AAAI.

[22]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[23]  Brett Browning,et al.  Learning by demonstration with critique from a human teacher , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).