论文信息 - Integrating reinforcement learning with human demonstrations of varying ability

Integrating reinforcement learning with human demonstrations of varying ability

This work introduces Human-Agent Transfer (HAT), an algorithm that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations transferred into a baseline policy for an agent and refined using reinforcement learning significantly improve both learning time and policy performance. Our evaluation compares three algorithmic approaches to incorporating demonstration rule summaries into transfer learning, and studies the impact of demonstration quality and quantity, as well as the effect of combining demonstrations from multiple teachers. Our results show that all three transfer methods lead to statistically significant improvement in performance over learning without demonstration. The best performance was achieved by combining the best demonstrations from two teachers.

Sonia Chernova | Matthew E. Taylor | Halit Bener Suay | S. Chernova

[1] B. Skinner,et al. Science and human behavior , 1953 .

[2] Richard S. Sutton,et al. Training and Tracking in Robotics , 1985, IJCAI.

[3] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[4] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[7] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.

[8] Ian Frank,et al. Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[9] Jude W. Shavlik,et al. Creating advice-taking reinforcement learners , 1998 .

[10] Ian Witten,et al. Data Mining , 2000 .

[11] Thomas de Quincey. [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.

[12] Bruce Blumberg,et al. Integrated learning for interactive synthetic characters , 2002, SIGGRAPH.

[13] Pierre-Yves Oudeyer,et al. Robotic clicker training , 2002, Robotics Auton. Syst..

[14] Leslie Pack Kaelbling,et al. Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[15] PriceBob,et al. Accelerating reinforcement learning through implicit imitation , 2003 .

[16] C. Boutilier,et al. Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[17] Eric Wiewiora,et al. Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[18] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[19] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[20] Gregory Kuhlmann and Peter Stone and Raymond J. Mooney and Shavlik. Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer , 2004, AAAI 2004.

[21] Peter Stone,et al. Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[22] Jude W. Shavlik,et al. Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another , 2005, ECML.

[23] Peter Stone,et al. Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[24] Manuela M. Veloso,et al. Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.

[25] Peter Stone,et al. Autonomous Learning of Stable Quadruped Locomotion , 2006, RoboCup.

[26] Daniel H. Grollman,et al. Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.