Using Human Demonstrations to Improve Reinforcement Learning

This work introduces Human-Agent Transfer (HAT), an algorithm that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations transferred into a baseline policy for an agent and refined using reinforcement learning significantly improve both learning time and policy performance. Our evaluation compares three algorithmic approaches to incorporating demonstration rule summaries into transfer learning, and studies the impact of demonstration quality and quantity. Our results show that all three transfer methods lead to statistically significant improvement in performance over learning without demonstration.

[1]  Peter Stone,et al.  Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[2]  Gregory Kuhlmann and Peter Stone and Raymond J. Mooney and Shavlik Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer , 2004, AAAI 2004.

[3]  Richard S. Sutton,et al.  Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[4]  Eric Fritzinger,et al.  Learning behavior fusion from demonstration , 2008 .

[5]  Daniel H. Grollman,et al.  Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[6]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[7]  Jude W. Shavlik,et al.  Creating Advice-Taking Reinforcement Learners , 1998, Machine Learning.

[8]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[9]  Manuela M. Veloso,et al.  Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.

[10]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[11]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[12]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[13]  Jude W. Shavlik,et al.  Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another , 2005, ECML.

[14]  Peter Stone,et al.  Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[15]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[16]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[17]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[18]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[19]  Peter Stone,et al.  Cross-domain transfer for reinforcement learning , 2007, ICML '07.

[20]  Javier García,et al.  Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[21]  Ian Frank,et al.  Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[22]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[23]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.