Integrating reinforcement learning with human demonstrations of varying ability

This work introduces Human-Agent Transfer (HAT), an algorithm that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations transferred into a baseline policy for an agent and refined using reinforcement learning significantly improve both learning time and policy performance. Our evaluation compares three algorithmic approaches to incorporating demonstration rule summaries into transfer learning, and studies the impact of demonstration quality and quantity, as well as the effect of combining demonstrations from multiple teachers. Our results show that all three transfer methods lead to statistically significant improvement in performance over learning without demonstration. The best performance was achieved by combining the best demonstrations from two teachers.

[1]  B. Skinner,et al.  Science and human behavior , 1953 .

[2]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[3]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[4]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[5]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[7]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[8]  Ian Frank,et al.  Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[9]  Jude W. Shavlik,et al.  Creating advice-taking reinforcement learners , 1998 .

[10]  Ian Witten,et al.  Data Mining , 2000 .

[11]  Thomas de Quincey [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.

[12]  Bruce Blumberg,et al.  Integrated learning for interactive synthetic characters , 2002, SIGGRAPH.

[13]  Pierre-Yves Oudeyer,et al.  Robotic clicker training , 2002, Robotics Auton. Syst..

[14]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[15]  PriceBob,et al.  Accelerating reinforcement learning through implicit imitation , 2003 .

[16]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[17]  Eric Wiewiora,et al.  Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[18]  Richard S. Sutton,et al.  Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[19]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[20]  Gregory Kuhlmann and Peter Stone and Raymond J. Mooney and Shavlik Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer , 2004, AAAI 2004.

[21]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[22]  Jude W. Shavlik,et al.  Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another , 2005, ECML.

[23]  Peter Stone,et al.  Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[24]  Manuela M. Veloso,et al.  Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.

[25]  Peter Stone,et al.  Autonomous Learning of Stable Quadruped Locomotion , 2006, RoboCup.

[26]  Daniel H. Grollman,et al.  Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[27]  Jude W. Shavlik,et al.  Relational Macros for Transfer in Reinforcement Learning , 2007, ILP.

[28]  Peter Stone,et al.  Cross-domain transfer for reinforcement learning , 2007, ICML '07.

[29]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[30]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[31]  Eric Fritzinger,et al.  Learning behavior fusion from demonstration , 2008 .

[32]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[33]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[34]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[35]  Matthew E. Taylor Assisting Transfer-Enabled Machine Learning Algorithms: Leveraging Human Knowledge for Curriculum Design , 2009, AAAI Spring Symposium: Agents that Learn from Human Teachers.

[36]  Andrea Lockerd Thomaz,et al.  Using training regimens to teach expanding function approximators , 2010, AAMAS.

[37]  Peter Stone,et al.  Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[38]  Javier García,et al.  Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[39]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .