Relational reinforcement learning with guided demonstrations

Abstract Model-based reinforcement learning is a powerful paradigm for learning tasks in robotics. However, in-depth exploration is usually required and the actions have to be known in advance. Thus, we propose a novel algorithm that integrates the option of requesting teacher demonstrations to learn new domains with fewer action executions and no previous knowledge. Demonstrations allow new actions to be learned and they greatly reduce the amount of exploration required, but they are only requested when they are expected to yield a significant improvement because the teacher's time is considered to be more valuable than the robot's time. Moreover, selecting the appropriate action to demonstrate is not an easy task, and thus some guidance is provided to the teacher. The rule-based model is analyzed to determine the parts of the state that may be incomplete, and to provide the teacher with a set of possible problems for which a demonstration is needed. Rule analysis is also used to find better alternative models and to complete subgoals before requesting help, thereby minimizing the number of requested demonstrations. These improvements were demonstrated in a set of experiments, which included domains from the international planning competition and a robotic task. Adding teacher demonstrations and rule analysis reduced the amount of exploration required by up to 60% in some domains, and improved the success ratio by 35% in other domains.

[1]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[2]  Qiang Yang,et al.  Learning complex action models with quantifiers and logical implications , 2010, Artif. Intell..

[3]  Daniel H. Grollman,et al.  Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[4]  L. P. Kaelbling,et al.  Learning Symbolic Models of Stochastic Domains , 2007, J. Artif. Intell. Res..

[5]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[6]  Ales Ude,et al.  Analysis of human peg-in-hole executions in a robotic embodiment using uncertain grasps , 2013, 9th International Workshop on Robot Motion and Control.

[7]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[8]  Oliver Brock,et al.  Learning to Manipulate Articulated Objects in Unstructured Environments Using a Grounded Relational Representation , 2008, Robotics: Science and Systems.

[9]  Carme Torras,et al.  Active learning of manipulation sequences , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Thomas J. Walsh,et al.  Knows what it knows: a framework for self-aware learning , 2008, ICML.

[11]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[12]  Henrik Gordon Petersen,et al.  Pose estimation using local structure-specific shape and appearance context , 2013, 2013 IEEE International Conference on Robotics and Automation.

[13]  Céline Rouveirol,et al.  Incremental Learning of Relational Action Models in Noisy Environments , 2010, ILP.

[14]  Marc Toussaint,et al.  Exploration in relational domains for model-based reinforcement learning , 2012, J. Mach. Learn. Res..

[15]  Michael L. Littman,et al.  A unifying framework for computational reinforcement learning theory , 2009 .

[16]  Carme Torras,et al.  Integrating Task Planning and Interactive Learning for Robots to Work in Human Environments , 2011, IJCAI.

[17]  Eren Erdal Aksoy,et al.  Point cloud video object segmentation using a persistent supervoxel world-model , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Mausam,et al.  LRTDP Versus UCT for Online Probabilistic Planning , 2012, AAAI.

[19]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[20]  Thomas J. Walsh,et al.  Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.

[21]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[22]  Thomas J. Walsh,et al.  Generalizing Apprenticeship Learning across Hypothesis Classes , 2010, ICML.

[23]  Daniele Nardi,et al.  Knowledge acquisition through human–robot multimodal interaction , 2013, Intell. Serv. Robotics.

[24]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[25]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[26]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[27]  Mark Steedman,et al.  Object-Action Complexes: Grounded abstractions of sensory-motor processes , 2011, Robotics Auton. Syst..

[28]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[29]  Bernhard Nebel,et al.  Coming up With Good Excuses: What to do When no Plan Can be Found , 2010, Cognitive Robotics.

[30]  Eren Erdal Aksoy,et al.  Learning the semantics of object–action relations by observation , 2011, Int. J. Robotics Res..

[31]  Thomas J. Walsh,et al.  Efficient learning of relational models for sequential decision making , 2010 .

[32]  Eren Erdal Aksoy,et al.  Manipulation monitoring and robot intervention in complex manipulation sequences , 2014, RSS 2014.

[33]  Manuela M. Veloso,et al.  Multi-resolution Corrective Demonstration for Efficient Task Execution and Refinement , 2012, Int. J. Soc. Robotics.

[34]  Wolfram Burgard,et al.  Learning Relational Navigation Policies , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.