Learning Robot Motion Control from Demonstration and Human Advice

As robots become more commonplace within society, the need for tools that enable non-robotics-experts to develop control algorithms, or policies, will increase. Learning from Demonstration (LfD) offers one promising approach, where the robot learns a policy from teacher task executions. In this work we present an algorithm that incorporates human teacher feedback to enable policy improvement from learner experience within an LfD framework. We present two implementations of this algorithm, that differ in the sort of teacher feedback they provide. In the first implementation, called Binary Critiquing (BC), the teacher provides a binary indication that highlights poorly performing portions of the execution. In the second implementation, called Advice-Operator Policy Improvement (A-OPI), the teacher provides a correction on poorly performing portions of the student execution. Most notably, these corrections are continuous-valued and appropriate for low level motion control action spaces. The algorithms are applied to validation domains, one simulated and one a Segway RMP platform. For both, policy performance is found to improve with teacher feedback. Specifically, with BC learner execution success and efficiency come to exceed teacher performance. With A-OPI task success and accuracy are shown to be similar or superior to the typical LfD approach of correcting behavior through more teacher demonstrations.

[1]  Manuela M. Veloso,et al.  Confidence-based policy learning from demonstration using Gaussian mixture models , 2007, AAMAS '07.

[2]  Brett Browning,et al.  Learning by demonstration with critique from a human teacher , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[3]  Aude Billard,et al.  Incremental learning of gestures by imitation in a humanoid robot , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[4]  Christopher G. Atkeson,et al.  Learning from observation using primitives , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[5]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[6]  M. Rehm,et al.  Proceedings of AAMAS , 2005 .

[7]  Dana H. Ballard,et al.  Recognizing teleoperated manipulations , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[8]  Luís Nunes,et al.  On Learning by Exchanging Advice , 2002, ArXiv.

[9]  M. Stolle,et al.  Knowledge Transfer Using Local Features , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[10]  Daniel H. Grollman,et al.  Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[11]  Brett Browning,et al.  Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Monica N. Nicolescu,et al.  Natural methods for robot task learning: instructive demonstrations, generalization and practice , 2003, AAMAS '03.

[13]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.