论文信息 - Interactive Policy Learning through Confidence-Based Autonomy

Interactive Policy Learning through Confidence-Based Autonomy

We present Confidence-Based Autonomy (CBA), an interactive algorithm for policy learning from demonstration. The CBA algorithm consists of two components which take advantage of the complimentary abilities of humans and computer agents. The first component, Confident Execution, enables the agent to identify states in which demonstration is required, to request a demonstration from the human teacher and to learn a policy based on the acquired data. The algorithm selects demonstrations based on a measure of action selection confidence, and our results show that using Confident Execution the agent requires fewer demonstrations to learn the policy than when demonstrations are selected by a human teacher. The second algorithmic component, Corrective Demonstration, enables the teacher to correct any mistakes made by the agent through additional demonstrations in order to improve the policy and future task performance. CBA and its individual components are compared and evaluated in a complex simulated driving domain. The complete CBA algorithm results in the best overall learning performance, successfully reproducing the behavior of the teacher while balancing the tradeoff between number of demonstrations and number of incorrect actions during learning.

Manuela M. Veloso | Sonia Chernova | M. Veloso | S. Chernova

[1] Paul E. Utgoff,et al. On integrating apprentice learning and reinforcement learning , 1996 .

[2] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.

[3] Pat Langley,et al. Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[4] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[5] Tetsunari Inamura Masayuki Inaba Hirochika. Acquisition of Probabilistic Behavior Decision Model based on the Interactive Teaching Method , 2001 .

[6] Leslie Pack Kaelbling,et al. Making Reinforcement Learning Work on Real Robots , 2002 .

[7] C. Boutilier,et al. Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[8] Minoru Asada,et al. A Hierarchical Multi-module Learning System Based on Self-interpretation of Instructions by Coach , 2003, RoboCup.

[9] Maja J. Matarić,et al. A framework for learning from demonstration, generalization and practice in human-robot domains , 2003 .

[10] Brett Browning,et al. Skill Acquisition and Use for a Dynamically-Balancing Soccer Robot , 2004, AAAI.

[11] Gordon Cheng,et al. Learning to Act from Observation and Practice , 2004, Int. J. Humanoid Robotics.

[12] David A. Cohn,et al. Improving generalization with active learning , 1994, Machine Learning.

[13] Andrea Lockerd Thomaz,et al. Tutelage and socially guided robot learning , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[14] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[17] Chrystopher L. Nehaniv,et al. Teaching robots by moulding behavior and scaffolding the environment , 2006, HRI '06.

[18] Daniel H. Grollman,et al. Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[19] Manuela M. Veloso,et al. Confidence-based policy learning from demonstration using Gaussian mixture models , 2007, AAMAS '07.

[20] Brett Browning,et al. Learning by demonstration with critique from a human teacher , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[21] M. Stolle,et al. Knowledge Transfer Using Local Features , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[22] Manuela M. Veloso,et al. Learning equivalent action choices from demonstration , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23] Manuela M. Veloso,et al. Teaching collaborative multi-robot tasks through demonstration , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[24] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..