Self-improvement of learned action models with learned goal models

We introduce a new method for robots to further improve upon skills acquired through Learning from Demonstration. Previously, we have introduced a method to learn both an action model to execute the skill and a goal model to monitor the execution of the skill. In this paper we show how to use the learned goal models to improve the learned action models autonomously, without further user interaction. Trajectories are sampled from the action model and executed on the robot. The goal model then labels them as success or failure and the successful ones are used to update the action model. We introduce an adaptive sampling method to speed up convergence. We show through both simulation and real robot experiments that our method can fix a failed action model.

[1]  Henrik I. Christensen,et al.  Efficient Organized Point Cloud Segmentation with Connected Components , 2013 .

[2]  Gary R. Bradski,et al.  Fast 3D recognition and pose using the Viewpoint Feature Histogram , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Maya Cakmak,et al.  Robot Programming by Demonstration with Interactive Action Visualizations , 2014, Robotics: Science and Systems.

[4]  Maya Cakmak,et al.  Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[5]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[6]  Andrea Lockerd Thomaz,et al.  Simultaneously learning actions and goals from demonstration , 2016, Auton. Robots.

[7]  Scott Niekum,et al.  Learning grounded finite-state representations from unstructured demonstrations , 2015, Int. J. Robotics Res..

[8]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[9]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[10]  Siddhartha S. Srinivasa,et al.  Inverse Optimal Heuristic Control for Imitation Learning , 2009, AISTATS.

[11]  Maya Cakmak,et al.  Keyframe-based Learning from Demonstration , 2012, Int. J. Soc. Robotics.

[12]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[13]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[14]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.