Teacher feedback to scaffold and refine demonstrated motion primitives on a mobile robot

Task demonstration is an effective technique for developing robot motion control policies. As tasks become more complex, however, demonstration can become more difficult. In this work, we introduce an algorithm that uses corrective human feedback to build a policy able to perform a novel task, by combining simpler policies learned from demonstration. While some demonstration-based learning approaches do adapt policies with execution experience, few provide corrections within low-level motion control domains or to enable the linking of multiple of demonstrated policies. Here we introduce Feedback for Policy Scaffolding (FPS) as an algorithm that first evaluates and corrects the execution of motion primitive policies learned from demonstration. The algorithm next corrects and enables the execution of a more complex task constructed from these primitives. Key advantages of building a policy from demonstrated primitives is the potential for primitive policy reuse within multiple complex policies and the faster development of these policies, in addition to the development of complex policies for which full demonstration is difficult. Policy reuse under our algorithm is assisted by human teacher feedback, which also contributes to the improvement of policy performance. Within a simulated robot motion control domain we validate that, using FPS, a policy for a novel task is successfully built from motion primitives learned from demonstration. We show feedback to both aid and enable policy development, improving policy performance in success, speed and efficiency.

[1]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[2]  M. Veloso,et al.  The first segway soccer experience: towards peer-to-peer human-robot teams , 2006, HRI '06.

[3]  K. Dautenhahn,et al.  The correspondence problem , 2002 .

[4]  Manuela M. Veloso,et al.  Multi-thresholded approach to demonstration selection for interactive robot learning , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[5]  Roderic A. Grupen,et al.  A model of shared grasp affordances from demonstration , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[6]  Brett Browning,et al.  Learning Mobile Robot Motion Control from Demonstrated Primitives and Human Feedback , 2011, ISRR.

[7]  Dana H. Ballard,et al.  Recognizing teleoperated manipulations , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[8]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[9]  Manuela M. Veloso,et al.  Learning equivalent action choices from demonstration , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Christopher G. Atkeson,et al.  Learning from observation using primitives , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[11]  Brett Browning,et al.  Mobile Robot Motion Control from Demonstration and Corrective Feedback , 2010, From Motor Learning to Interaction Learning in Robots.

[12]  Leslie Pack Kaelbling,et al.  Making Reinforcement Learning Work on Real Robots , 2002 .

[13]  M. Stolle,et al.  Knowledge Transfer Using Local Features , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[14]  Hoa G. Nguyen,et al.  Segway robotic mobility platform , 2004, SPIE Optics East.

[15]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[16]  Chrystopher L. Nehaniv,et al.  Teaching robots by moulding behavior and scaffolding the environment , 2006, HRI '06.

[17]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[18]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[19]  K. Dautenhahn,et al.  Imitation in Animals and Artifacts , 2002 .

[20]  Monica N. Nicolescu,et al.  Natural methods for robot task learning: instructive demonstrations, generalization and practice , 2003, AAMAS '03.

[21]  José María Valls,et al.  Correcting and improving imitation models of humans for Robosoccer agents , 2005, 2005 IEEE Congress on Evolutionary Computation.

[22]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[23]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[24]  Aude Billard,et al.  Incremental learning of gestures by imitation in a humanoid robot , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[25]  Reid G. Simmons,et al.  A task description language for robot control , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[26]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[27]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[28]  Daniel H. Grollman,et al.  Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.